[
https://issues.apache.org/jira/browse/PDFBOX-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948169#comment-13948169
]
Tilman Hausherr commented on PDFBOX-1086:
-----------------------------------------
I fixed two of three decoders re fill bits. (I could fix the third one but
would prefer to have a test file). Now there's only PDFBOX-457 left. It could
be an EOL, but I can neither prove or disprove that theory. An EOL would make
no sense in a G4 encoded document, at least according to wikipedia:
https://en.wikipedia.org/wiki/Group_4_compression
> Error when decoding CCITT compressed data that contains EOLs, fill bits etc.
> ----------------------------------------------------------------------------
>
> Key: PDFBOX-1086
> URL: https://issues.apache.org/jira/browse/PDFBOX-1086
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Reporter: Jeremias Maerki
> Assignee: Jeremias Maerki
> Labels: CCITTFaxDecode, ccitt
>
> The TIFFFaxDecoder class (originally coming from JAI via XML Graphics
> Commons) does not handle cases like EOLs between lines and in front. But the
> PDF CCITTFaxDecode filter needs to allow many different variants of the
> encoding. Apparently, TIFF has a relatively restricted way of encoding CCITT
> data, so TIFFFaxDecoder was not written to be as flexible as we need it.
> Ideally, PDFBox should handle anything that gets thrown at it.
> It apprears that it would be rather difficult to retrofit TIFFFaxDecoder with
> the necessary flexibility. So, new decoders for T.4 and T.6 should probably
> be written.
--
This message was sent by Atlassian JIRA
(v6.2#6252)