[ 
https://issues.apache.org/jira/browse/PDFBOX-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler reopened PDFBOX-756:
---------------------------------------
      Assignee: Andreas Lehmkühler

I've stumbled upon this issue when working on some other things. PDFBox is now 
able to use the DictionaryEnconding if well know char names are used. In the 
given case some mappings are missing. I'm going to add them to additional.txt 
which handles char mappings which are not defined by the adobe glyph list, most 
likely tex based mappings

> Some characters from TeX-created files are mapped into ASCII range 1-31
> -----------------------------------------------------------------------
>
>                 Key: PDFBOX-756
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-756
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.2.0
>         Environment: Mac OS X 10.6.4
>            Reporter: Thomas Fischer
>            Assignee: Andreas Lehmkühler
>            Priority: Minor
>         Attachments: 826130.pdf, 826130.txt
>
>
> For some TeX-created files, some characters are mapped to low ASCII values. 
> Example:
> fx  2y − fx − 2y 
> instead of
> (x + 2y) - f(x − 2y) =
> With the non-printable characters denote by \xN, PDFBox's result is
> f\x3x\x4 2y\x5 − f\x3x − 2y\x5 \x6
> This probably cannot be fixed, since in another file the same numbers 
> represent different characters:
> Za  {a, a  1, . . .}
> instead of
> Z(a) = {a, a + 1,...}
> (Z\x4a\x5 \x6 {a, a \x7 1,  . . .})
> in another file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to