[
https://issues.apache.org/jira/browse/PDFBOX-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler closed PDFBOX-1823.
--------------------------------------
Resolution: Not A Problem
Assignee: Andreas Lehmkühler
The text can't be extracted. The pdf doesn't contain any information to map the
internal glyph id to readable text.
The only workaround I know is to convert every single page of the pdf to an
image and pass the result to an OCR software. But I guess that is very handy ...
Anyway, I've closed this issue, as there isn't any problem with PDFBox. If you
have any further questions please address those to one of our the mailing
lists. See [1] on how to subscribe to it.
[1] http://pdfbox.apache.org/mailinglists.html
> Apache PDFBox 1.6.0 TextStripper not able to recognise characters having
> "Frutiger LT - 45" fonts
> -------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-1823
> URL: https://issues.apache.org/jira/browse/PDFBOX-1823
> Project: PDFBox
> Issue Type: Bug
> Components: FontBox
> Affects Versions: 1.6.0
> Environment: jdk1.6
> Reporter: Chitrang Natu
> Assignee: Andreas Lehmkühler
> Labels: newbie
> Attachments: PDF_With_Frutiger_font.pdf,
> TC01_output.concat.MD302AE_Part2.doc, Test_Frutiger.java,
> fontbox-checkstyle.xml, pdfbox-checkstyle.xml, pom.xml
>
> Original Estimate: 504h
> Remaining Estimate: 504h
>
> When i tried to extract contents from PDF's I am successfully able to extract
> all text with PDFBox API but getting trouble with fonts having 'Frutiger'
> style. For these i am getting squared Boxes in place of characters.
> It seems PDFBox FontBox supports only 14 UTF characters set And none of them
> is Frutiger style fonts.
> If anybody please can suggest something. That would be of great help. I am in
> urgent need of the solution.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)