[ 
https://issues.apache.org/jira/browse/PDFBOX-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reopened PDFBOX-833:
---------------------------------------


I can confirm that the fix introduced a regression with the text extraction as 
described by simon -> reopened
                
> Wrong encoding with Type1C font when specific encoding is defined
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-833
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-833
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.3.1
>            Reporter: Timo Boehme
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.8.2
>
>         Attachments: pdfbox-833.patch, sample1-fixed.png, 
> sample1-original.png, sample.pdf, simpleh2.pdf
>
>
> The Type1C font implementation overwrites the encoding() method of PDFont 
> base class. This results in a lookup of codes to characters as defined in the 
> font.
> However if an encoding is explicitly given (like WinAnsiEncoding) this leads 
> to wrong results if encoding codes do not match glyph codes.
> In a test document (which unfortunately I cannot make public - an article 
> from Elsevier) a Type1C font is embedded which defines a copyright sign at 
> glyph position 259. The encoding is defines as WinAnsiEncoding. Text 
> characters are defined corresponding to the WinAnsiEncoding. In case of the 
> copyright sign it is 0xa9 (169) where the font has glyph 'quotesingle' 
> defined.
> Since currently I have no other test cases I implemented following workaround 
> for WinAnsiEncoding (which might be relaxed to other PDF encodings as well:
> in PDType1CFont.encode() I start with:
> if ( getEncoding() instanceof WinAnsiEncoding )
>   // use PDFont encoding
>   return super.encode( bytes, offset, length );
> This resolves the encoding problems for text extraction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to