Andreas, Sounds good. If you could ping me on TIKA-1442, I'll be sure to hear the message in a timely fashion. :)
I just tried to build Tika with 1.8.8-SNAPSHOT, and I found a problem with the non-sequential parser on one of our test files (http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/test/resources/test-documents/testPDF_protected.pdf). This is the stacktrace with pdfbox-app-1.8.8-20141124.081221-143.jar's ExtractText -nonSeq: Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException Nov 24, 2014 8:48:06 AM org.apache.pdfbox.filter.FlateFilter decode SEVERE: FlateFilter: stop reading corrupt stream due to a DataFormatException ExtractText failed with the following exception: java.io.IOException at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:109) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:379) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291) at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:22 5) at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.ja va:117) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi ne.java:251) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngi ne.java:235) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine. java:215) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.ja va:480) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.j ava:405) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java :364) at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275) at org.apache.pdfbox.ExtractText.main(ExtractText.java:85) at org.apache.pdfbox.PDFBox.main(PDFBox.java:58) Caused by: java.util.zip.DataFormatException: incorrect header check at java.util.zip.Inflater.inflateBytes(Native Method) at java.util.zip.Inflater.inflate(Inflater.java:259) at java.util.zip.Inflater.inflate(Inflater.java:280) at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:128) at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:101) ... 13 more -----Original Message----- From: Andreas Lehmkühler [mailto:andr...@lehmi.de] Sent: Monday, November 24, 2014 8:39 AM To: dev@pdfbox.apache.org Subject: RE: PDFBox 1.8.8. release Hi, > "Allison, Timothy B." <talli...@mitre.org> hat am 24. November 2014 um 13:10 > geschrieben: > > > Let me know when to hit "run"... Thanks for the offer, there is just one thing related to PDFBOX-2430 I'd like to fix this evening ...... BR Andras Lehmkühler > > -----Original Message----- > From: Andreas Lehmkuehler [mailto:andr...@lehmi.de] > Sent: Sunday, November 23, 2014 12:27 PM > To: dev@pdfbox.apache.org > Subject: Re: PDFBox 1.8.8. release > > Hi, > > Am 23.11.2014 um 17:55 schrieb Tilman Hausherr: > > Hi. > > > > I'd prefer to wait for the tests of Tim Allison... unless you want to live > > with > > the risk that he does the tests, and that we find a "big problem" within > > that 3 > > day voting period... > Good point. > > > I haven't asked him to do these tests yet, because so much work was done on > > both > > parsers. > I guess I'm done with parser changes at least in the 1.8 branch > > > Tilman > > BR > Andreas Lehmkühler > > > > > Am 23.11.2014 um 17:14 schrieb Andreas Lehmkuehler: > >> Hi, > >> > >> Am 11.11.2014 um 12:15 schrieb Andreas Lehmkühler: > >>> Hi, > >>> > >>>> Andreas Lehmkühler <andr...@lehmi.de> hat am 3. November 2014 um 11:52 > >>>> geschrieben: > >>>> > >>>> > >>>> Hi, > >>>> > >>>> there are again a number of solved issues and I'm thinking about a new > >>>> bugfix release. How about a new one next week, maybe later if someone > >>>> wants to get some addtional things done before? > >>> Looks like I won't have the time this week to cut the release, sorry. > >>> I'm not sure if I'll find some time when attending ApacheCon in Budapest > >>> next > >>> week, > >>> but I should have some cycles in the last week of november. > >>> > >>> This will buy us some time to fix some of the encryption/decryption > >>> issues. > >> I'm going to cut the release tomorrow in the evening, round about 24 hours > >> from now. Any objections? > >> > >> > >> BR > >> Andreas Lehmkühler > > >