[jira] [Commented] (PDFBOX-2532) Text extraction fails due to the usage of the internal font mapping

JIRA Mon, 01 Dec 2014 11:34:31 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230296#comment-14230296
 ]


Andreas Lehmkühler commented on PDFBOX-2532:
--------------------------------------------

Maybe your are using the wrong PDF? My acrobat is able to extract the text as 
well as the newest branch version. And no, using the internal font mapping for 
text extraction isn't suitable in any situation.

> Text extraction fails due to the usage of the internal font mapping
> -------------------------------------------------------------------
>
>                 Key: PDFBOX-2532
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2532
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Andreas Lehmkühler
>             Fix For: 2.0.0
>
>         Attachments: PDFBOX2247-701542.pdf
>
>
> If a pdf doesn't provide any mapping (neither an encoding nor a toUnicode 
> mapping) we have to decide where to get a suitable mapping ourselves. We 
> can't use the internal font mapping of the type1C font as it doesn't work in 
> every case, see PDFBOX-2377



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PDFBOX-2532) Text extraction fails due to the usage of the internal font mapping

Reply via email to