[
https://issues.apache.org/jira/browse/PDFBOX-756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler reopened PDFBOX-756:
---------------------------------------
Assignee: Andreas Lehmkühler
I've stumbled upon this issue when working on some other things. PDFBox is now
able to use the DictionaryEnconding if well know char names are used. In the
given case some mappings are missing. I'm going to add them to additional.txt
which handles char mappings which are not defined by the adobe glyph list, most
likely tex based mappings
> Some characters from TeX-created files are mapped into ASCII range 1-31
> -----------------------------------------------------------------------
>
> Key: PDFBOX-756
> URL: https://issues.apache.org/jira/browse/PDFBOX-756
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.2.0
> Environment: Mac OS X 10.6.4
> Reporter: Thomas Fischer
> Assignee: Andreas Lehmkühler
> Priority: Minor
> Attachments: 826130.pdf, 826130.txt
>
>
> For some TeX-created files, some characters are mapped to low ASCII values.
> Example:
> fx 2y − fx − 2y
> instead of
> (x + 2y) - f(x − 2y) =
> With the non-printable characters denote by \xN, PDFBox's result is
> f\x3x\x4 2y\x5 − f\x3x − 2y\x5 \x6
> This probably cannot be fixed, since in another file the same numbers
> represent different characters:
> Za {a, a 1, . . .}
> instead of
> Z(a) = {a, a + 1,...}
> (Z\x4a\x5 \x6 {a, a \x7 1, . . .})
> in another file.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]