[ 
https://issues.apache.org/jira/browse/PDFBOX-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler updated PDFBOX-2976:
---------------------------------------
    Attachment: PDFBOX2976_FlateFilter.patch

The garbage at the beginning is not a problem as the self repair mechanism of 
the parser simply ignores it. 

But the content streams are all broken and the flate filter throws an exception 
which leads the parser to stop working. I guess that behaviour is probably new, 
former version swallowed that exception.
I've attached a patch which swallows the DataFormatExcpetion and reduces one of 
the buffer sizes to decompress as much as possible of the data.

> java.util.zip.DataFormatException: incorrect data check
> -------------------------------------------------------
>
>                 Key: PDFBOX-2976
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2976
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.0
>         Environment: Linux Mint 17.2 x64, JDK7u79, Glassfish 3.1.2.2
>            Reporter: Felix Rudolphi
>         Attachments: PDFBOX2976_FlateFilter.patch, sc-356376(1)-x.pdf, 
> sc-356376(1).pdf, sc-356376-x.pdf, sc-356376.pdf
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> When trying to open certain PDF files (examples attached, also any MSDS 
> available at http://www.scbt.com/datasheet-356376.html ), an expection is 
> thrown resulting in the file not being parsed:
> java.io.IOException: java.util.zip.DataFormatException: incorrect data check
>       at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
>       at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:78)
>       at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:160)
>       at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:143)
>       at org.apache.pdfbox.pdmodel.PDPage.getContents(PDPage.java:148)
>       at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:92)
>       at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:450)
>       at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:437)
>       at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:148)
>       at 
> org.apache.pdfbox.text.PDFTextStreamEngine.processPage(PDFTextStreamEngine.java:117)
>       at 
> org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:367)
>       at 
> org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:303)
>       at 
> org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:248)
>       at 
> org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:209)
> -- or --
> java.io.IOException: java.util.zip.DataFormatException: incorrect data check
>       at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
>       at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:78)
>       at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:160)
>       at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:143)
>       at org.apache.pdfbox.pdmodel.PDPage.getContents(PDPage.java:148)
>       at 
> org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:92)
>       at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:450)
>       at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:437)
>       at 
> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:148)
>       at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:179)
>       at 
> org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205)
>       at 
> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to