[
https://issues.apache.org/jira/browse/PDFBOX-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15395294#comment-15395294
]
Oliver Steinau commented on PDFBOX-3438:
----------------------------------------
Thank you for your prompt reply! Unfortunately, I cannot build PDFBox from
source, so I cannot use the patch.
Thinking about it, there's not much an extractor could do without a proper
mapping. On the other hand, the file was created by Acrobat Distiller, which is
not totally uncommon. Maybe it's worth the effort to examine other files
created by Distiller, and add your solution to PDFBox as an optional feature
for those files (maybe Distiller always omits the mappings, but always creates
names like this).
Anyway, I would downgrade this issue to a "New feature" or a "Wish" -- or
should it be deleted altogether?
> only garbage extracted, lots of warnings "No Unicode mapping..."
> ----------------------------------------------------------------
>
> Key: PDFBOX-3438
> URL: https://issues.apache.org/jira/browse/PDFBOX-3438
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.2
> Reporter: Oliver Steinau
> Attachments: PDFBOX-3438.diff, PDFBOX-3438.txt, test.pdf
>
>
> When I try to extract text from this PDF, I get lots of warnings "No Unicode
> mapping for ...", and as output I only get garbage.
> PDF file displays fine in Acrobat Reader, and pdftotext.exe will extract the
> text just fine.
> PDF file seems to have a Type-1 font embedded with a custom encoding.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]