[ 
https://issues.apache.org/jira/browse/PDFBOX-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948169#comment-13948169
 ] 

Tilman Hausherr commented on PDFBOX-1086:
-----------------------------------------

I fixed two of three decoders re fill bits. (I could fix the third one but 
would prefer to have a test file). Now there's only PDFBOX-457 left. It could 
be an EOL, but I can neither prove or disprove that theory. An EOL would make 
no sense in a G4 encoded document, at least according to wikipedia:
https://en.wikipedia.org/wiki/Group_4_compression

> Error when decoding CCITT compressed data that contains EOLs, fill bits etc.
> ----------------------------------------------------------------------------
>
>                 Key: PDFBOX-1086
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1086
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>            Reporter: Jeremias Maerki
>            Assignee: Jeremias Maerki
>              Labels: CCITTFaxDecode, ccitt
>
> The TIFFFaxDecoder class (originally coming from JAI via XML Graphics 
> Commons) does not handle cases like EOLs between lines and in front. But the 
> PDF CCITTFaxDecode filter needs to allow many different variants of the 
> encoding. Apparently, TIFF has a relatively restricted way of encoding CCITT 
> data, so TIFFFaxDecoder was not written to be as flexible as we need it. 
> Ideally, PDFBox should handle anything that gets thrown at it.
> It apprears that it would be rather difficult to retrofit TIFFFaxDecoder with 
> the necessary flexibility. So, new decoders for T.4 and T.6 should probably 
> be written.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to