[ 
https://issues.apache.org/jira/browse/PDFBOX-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074475#comment-14074475
 ] 

Tilman Hausherr commented on PDFBOX-2238:
-----------------------------------------

No, the other way is the correct thing to go. The nonSeq parser is the correct 
one. The old (and default) parser uses an "easy", but wrong method. However we 
have noticed after a test with Tim from TIKA that the old parser is slightly 
more successful with malformed PDFs.

> DataFormatException: incorrect header check
> -------------------------------------------
>
>                 Key: PDFBOX-2238
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2238
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: simon steiner
>            Assignee: Tilman Hausherr
>
> PDF from PDFBOX-186
> java -cp 
> lib/levigo-jbig2-imageio-1.6.0.jar:lib/jai_imageio.jar:pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar
>  org.apache.pdfbox.tools.WriteDecodedDoc PwC-Tech-Forecast-Spring-2009.pdf
> Exception in thread "main" java.io.IOException: 
> java.util.zip.DataFormatException: incorrect header check
>       at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
>       at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:365)
>       at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:278)
>       at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:189)
>       at 
> org.apache.pdfbox.tools.WriteDecodedDoc.doIt(WriteDecodedDoc.java:121)
>       at 
> org.apache.pdfbox.tools.WriteDecodedDoc.main(WriteDecodedDoc.java:192)
> Caused by: java.util.zip.DataFormatException: incorrect header check
>       at java.util.zip.Inflater.inflateBytes(Native Method)
>       at java.util.zip.Inflater.inflate(Inflater.java:259)
>       at java.util.zip.Inflater.inflate(Inflater.java:280)
>       at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101)
>       at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:65)
>       ... 5 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to