[
https://issues.apache.org/jira/browse/PDFBOX-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522404#comment-14522404
]
Tilman Hausherr edited comment on PDFBOX-2779 at 4/30/15 10:44 PM:
-------------------------------------------------------------------
The decoder that we use (which is an older version of a Sun decoder) doesn't
recover from corrupt data, but it should:
http://www.fileformat.info/mirror/egff/ch09_05.htm
{quote}
If corruption of the data transmission occurs, only K-1 scan lines of data will
be lost. The decoder will be able to resync the decoding at the next available
EOL code.
{quote}
The code at the exception does not do that, it just fails.
I looked around for other projects using the same decoder, and found this one
in a probably defunct project:
https://java.net/projects/pdf-renderer/sources/svn/content/trunk/src/com/sun/pdfview/decode/CCITTFaxDecoder.java?rev=140
That one does decode the file, but is LGPL licensed, so we can't use it. It has
numerous changes compared to our version.
was (Author: tilman):
The decoder that we use (which is an older version of a Sun decoder) doesn't
recover from corrupt data, but it should:
http://www.fileformat.info/mirror/egff/ch09_05.htm
{quote}
If corruption of the data transmission occurs, only K-1 scan lines of data will
be lost. The decoder will be able to resync the decoding at the next available
EOL code.
{quote}
The code at the exception does not do that, it just fails.
I looked around for other projects using the same decoder, and found this one:
https://java.net/projects/pdf-renderer/sources/svn/content/trunk/src/com/sun/pdfview/decode/CCITTFaxDecoder.java?rev=140
That one does decode the file, but is LGPL licensed, so we can't use it. It has
numerous changes compared to our version.
> PDF to Image Conversion fails with "EOL encountered in white run"
> -----------------------------------------------------------------
>
> Key: PDFBOX-2779
> URL: https://issues.apache.org/jira/browse/PDFBOX-2779
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering
> Affects Versions: 1.8.9, 2.0.0
> Reporter: Siegfried Goeschl
> Labels: CCITTFaxDecode, ccitt
> Attachments: eol-encountered-in-white-run-01.pdf
>
>
> One of my real-life PDF throws
> pdfbox-1.8.9> ./pdf-to-image.sh pdf/eol-encountered-in-white-run-01.pdf
> java -jar pdfbox-app-1.8.9.jar PDFToImage
> pdf/eol-encountered-in-white-run-01.pdf
> Apr 28, 2015 9:07:51 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke
> process
> SEVERE: java.lang.RuntimeException: EOL encountered in white run.
> java.lang.RuntimeException: EOL encountered in white run.
> at
> org.apache.pdfbox.filter.TIFFFaxDecoder.decodeNextScanline(TIFFFaxDecoder.java:622)
> at
> org.apache.pdfbox.filter.TIFFFaxDecoder.decode2D(TIFFFaxDecoder.java:767)
> at
> org.apache.pdfbox.filter.CCITTFaxDecodeFilter.decode(CCITTFaxDecodeFilter.java:116)
> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:351)
> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
> at
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
> at
> org.apache.pdfbox.pdmodel.graphics.xobject.PDCcitt.getRGBImage(PDCcitt.java:201)
> at
> org.apache.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:87)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
> at
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
> at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:139)
> at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
> at
> org.apache.pdfbox.util.PDFImageWriter.writeImage(PDFImageWriter.java:130)
> at org.apache.pdfbox.PDFToImage.main(PDFToImage.java:226)
> at org.apache.pdfbox.PDFBox.main(PDFBox.java:96)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]