[ https://issues.apache.org/jira/browse/PDFBOX-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17063973#comment-17063973 ]
Tilman Hausherr commented on PDFBOX-4781: ----------------------------------------- You can send me the PDF to tilman at snafu dot de. Btw you can open byte arrays directly. And the current version is 2.0.19. > PDF files with invalid compressed streams cannot be rendered > ------------------------------------------------------------ > > Key: PDFBOX-4781 > URL: https://issues.apache.org/jira/browse/PDFBOX-4781 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 2.0.18 > Reporter: Arnaud Jeansen > Priority: Major > > I am using pdfbox 2.0.18 to generate thumbnails of PDF files. > The code is basically as follows > {code:java} > byte[] pdfFile = ...; > float dpi = 72L; > try (PDDocument pdfDocument = PDDocument.load(new > ByteArrayInputStream(pdfFile))) { > PDFRenderer pdfRenderer = new PDFRenderer(pdfDocument); > return pdfRenderer.renderImageWithDPI(0, dpi, ImageType.RGB); > } catch (IOException e) { > // Error handling > } > {code} > This works fine but for a few PDF files with an invalid compressed stream. > Note: These PDF files open fine with a variety of PDF readers and java > libraries. Only pdfbox seems to fail on them. > For those files, I get an error log "FlateFilter: stop reading corrupt stream > due to a DataFormatException" *and* an `IOException` with stacktrace > {noformat} > Caused by: java.io.IOException: java.util.zip.DataFormatException: invalid > distance too far back > at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:58) > at org.apache.pdfbox.filter.Filter.decode(Filter.java:87) > at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:84) > at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:175) > at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:163) > at org.apache.pdfbox.pdmodel.PDPage.getContents(PDPage.java:170) > at > org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:92) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:499) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:483) > at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:156) > at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:269) > at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:321) > at > org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:243) > at > org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:229) > at > com.foocompany.service.PdfImageService.convertFromPdfBinaryToJpegBinary(PdfImageService.java:167) > ... 68 common frames omitted > Caused by: java.util.zip.DataFormatException: invalid distance too far back > at java.util.zip.Inflater.inflateBytes(Native Method) > at java.util.zip.Inflater.inflate(Inflater.java:259) > at java.util.zip.Inflater.inflate(Inflater.java:280) > at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:83) > at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:50) > ... 82 common frames omitted > {noformat} > Looking further into `org.apache.pdfbox.filter.FlateFilter` : > * The underlying `DataFormatException` (= broken content that cannot be > decompressed when reading the stream) is forwarded up *only* if nothing could > be read from this stream > (see FlateFilter#decompress) > * The `DataFormatException` is wrapped unconditionally into an `IOException`. > (see FlateFilter#decode) > As a hack, swallowing `DataFormatException` in `FlateFilter#decode` makes > things work. I get an error log but a thumbnail is correctly generated. > I am not sure how to proceed from here. I am willing to write a patch but I > am not sure how to address this issue. > I can also provide a PDF file that exhibits the problem privately. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org