[
https://issues.apache.org/jira/browse/PDFBOX-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074475#comment-14074475
]
Tilman Hausherr commented on PDFBOX-2238:
-----------------------------------------
No, the other way is the correct thing to go. The nonSeq parser is the correct
one. The old (and default) parser uses an "easy", but wrong method. However we
have noticed after a test with Tim from TIKA that the old parser is slightly
more successful with malformed PDFs.
> DataFormatException: incorrect header check
> -------------------------------------------
>
> Key: PDFBOX-2238
> URL: https://issues.apache.org/jira/browse/PDFBOX-2238
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.0
> Reporter: simon steiner
> Assignee: Tilman Hausherr
>
> PDF from PDFBOX-186
> java -cp
> lib/levigo-jbig2-imageio-1.6.0.jar:lib/jai_imageio.jar:pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar
> org.apache.pdfbox.tools.WriteDecodedDoc PwC-Tech-Forecast-Spring-2009.pdf
> Exception in thread "main" java.io.IOException:
> java.util.zip.DataFormatException: incorrect header check
> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:365)
> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:278)
> at
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:189)
> at
> org.apache.pdfbox.tools.WriteDecodedDoc.doIt(WriteDecodedDoc.java:121)
> at
> org.apache.pdfbox.tools.WriteDecodedDoc.main(WriteDecodedDoc.java:192)
> Caused by: java.util.zip.DataFormatException: incorrect header check
> at java.util.zip.Inflater.inflateBytes(Native Method)
> at java.util.zip.Inflater.inflate(Inflater.java:259)
> at java.util.zip.Inflater.inflate(Inflater.java:280)
> at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101)
> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:65)
> ... 5 more
--
This message was sent by Atlassian JIRA
(v6.2#6252)