[ 
https://issues.apache.org/jira/browse/PDFBOX-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230467#comment-14230467
 ] 

Andreas Lehmkühler commented on PDFBOX-2532:
--------------------------------------------

{quote}
I'm not sure what you mean by "internal font mapping", if you mean the embedded 
Encoding inside a Type1C font
{quote}
Yes, exactly.
{quote}
it is certainly appropriate to use that when the PDF does not provide any 
further information.
{quote}
But obviously not in all cases and by the way who says that it is certainly 
appropriate?
{quote}
I'm using Acrobat Pro XI, build: 11.0.9.29
{quote}
I'm using adobe reader 11.0.9 on windows and 9.5.5 on linux

Have a look at the pdf itself. The text is already readable and gets scrambled 
by using the embedded encoding. See PDFBOX-2377 for details on my solution for 
the 1.8-branch



> Text extraction fails due to the usage of the internal font mapping
> -------------------------------------------------------------------
>
>                 Key: PDFBOX-2532
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2532
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Andreas Lehmkühler
>             Fix For: 2.0.0
>
>         Attachments: PDFBOX2247-701542.pdf
>
>
> If a pdf doesn't provide any mapping (neither an encoding nor a toUnicode 
> mapping) we have to decide where to get a suitable mapping ourselves. We 
> can't use the internal font mapping of the type1C font as it doesn't work in 
> every case, see PDFBOX-2377



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to