[ 
https://issues.apache.org/jira/browse/PDFBOX-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522404#comment-14522404
 ] 

Tilman Hausherr edited comment on PDFBOX-2779 at 4/30/15 10:44 PM:
-------------------------------------------------------------------

The decoder that we use (which is an older version of a Sun decoder) doesn't 
recover from corrupt data, but it should:
http://www.fileformat.info/mirror/egff/ch09_05.htm
{quote}
If corruption of the data transmission occurs, only K-1 scan lines of data will 
be lost. The decoder will be able to resync the decoding at the next available 
EOL code. 
{quote}
The code at the exception does not do that, it just fails.

I looked around for other projects using the same decoder, and found this one 
in a probably defunct project:
https://java.net/projects/pdf-renderer/sources/svn/content/trunk/src/com/sun/pdfview/decode/CCITTFaxDecoder.java?rev=140

That one does decode the file, but is LGPL licensed, so we can't use it. It has 
numerous changes compared to our version.


was (Author: tilman):
The decoder that we use (which is an older version of a Sun decoder) doesn't 
recover from corrupt data, but it should:
http://www.fileformat.info/mirror/egff/ch09_05.htm
{quote}
If corruption of the data transmission occurs, only K-1 scan lines of data will 
be lost. The decoder will be able to resync the decoding at the next available 
EOL code. 
{quote}
The code at the exception does not do that, it just fails.

I looked around for other projects using the same decoder, and found this one:
https://java.net/projects/pdf-renderer/sources/svn/content/trunk/src/com/sun/pdfview/decode/CCITTFaxDecoder.java?rev=140

That one does decode the file, but is LGPL licensed, so we can't use it. It has 
numerous changes compared to our version.

> PDF to Image Conversion fails with "EOL encountered in white run"
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-2779
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2779
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 1.8.9, 2.0.0
>            Reporter: Siegfried Goeschl
>              Labels: CCITTFaxDecode, ccitt
>         Attachments: eol-encountered-in-white-run-01.pdf
>
>
> One of my real-life PDF throws
> pdfbox-1.8.9> ./pdf-to-image.sh pdf/eol-encountered-in-white-run-01.pdf 
> java -jar pdfbox-app-1.8.9.jar PDFToImage 
> pdf/eol-encountered-in-white-run-01.pdf
> Apr 28, 2015 9:07:51 PM org.apache.pdfbox.util.operator.pagedrawer.Invoke 
> process
> SEVERE: java.lang.RuntimeException: EOL encountered in white run.
> java.lang.RuntimeException: EOL encountered in white run.
>       at 
> org.apache.pdfbox.filter.TIFFFaxDecoder.decodeNextScanline(TIFFFaxDecoder.java:622)
>       at 
> org.apache.pdfbox.filter.TIFFFaxDecoder.decode2D(TIFFFaxDecoder.java:767)
>       at 
> org.apache.pdfbox.filter.CCITTFaxDecodeFilter.decode(CCITTFaxDecodeFilter.java:116)
>       at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:351)
>       at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:291)
>       at 
> org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:225)
>       at 
> org.apache.pdfbox.pdmodel.graphics.xobject.PDCcitt.getRGBImage(PDCcitt.java:201)
>       at 
> org.apache.pdfbox.util.operator.pagedrawer.Invoke.process(Invoke.java:87)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
>       at 
> org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
>       at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:139)
>       at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
>       at 
> org.apache.pdfbox.util.PDFImageWriter.writeImage(PDFImageWriter.java:130)
>       at org.apache.pdfbox.PDFToImage.main(PDFToImage.java:226)
>       at org.apache.pdfbox.PDFBox.main(PDFBox.java:96)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to