[jira] [Closed] (PDFBOX-1823) Apache PDFBox 1.6.0 TextStripper not able to recognise characters having "Frutiger LT - 45" fonts

JIRA Thu, 02 Jan 2014 08:11:41 -0800

     [ 
https://issues.apache.org/jira/browse/PDFBOX-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andreas Lehmkühler closed PDFBOX-1823.
--------------------------------------

    Resolution: Not A Problem
      Assignee: Andreas Lehmkühler

The text can't be extracted. The pdf doesn't contain any information to map the 
internal glyph id to readable text.

The only workaround I know is to convert every single page of the pdf to an 
image and pass the result to an OCR software. But I guess that is very handy ...

Anyway, I've closed this issue, as there isn't any problem with PDFBox. If you 
have any further questions please address those to one of our the mailing 
lists. See [1] on how to subscribe to it.

[1] http://pdfbox.apache.org/mailinglists.html



> Apache PDFBox 1.6.0 TextStripper not able to recognise characters having 
> "Frutiger LT - 45" fonts
> -------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1823
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1823
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 1.6.0
>         Environment: jdk1.6
>            Reporter: Chitrang Natu
>            Assignee: Andreas Lehmkühler
>              Labels: newbie
>         Attachments: PDF_With_Frutiger_font.pdf, 
> TC01_output.concat.MD302AE_Part2.doc, Test_Frutiger.java, 
> fontbox-checkstyle.xml, pdfbox-checkstyle.xml, pom.xml
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> When i tried to extract contents from PDF's I am successfully able to extract 
> all text with PDFBox API but getting trouble with fonts having 'Frutiger' 
> style. For these i am getting squared Boxes in place of characters.
> It seems PDFBox FontBox supports only 14 UTF characters set  And none of them 
> is Frutiger style fonts. 
> If anybody please can suggest something. That would be of great help. I am in 
> urgent need of the solution.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Closed] (PDFBOX-1823) Apache PDFBox 1.6.0 TextStripper not able to recognise characters having "Frutiger LT - 45" fonts

Reply via email to