[
https://issues.apache.org/jira/browse/PDFBOX-4648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932100#comment-16932100
]
Tilman Hausherr commented on PDFBOX-4648:
-----------------------------------------
No, you would have to use OCR. The problem occurs when creating the PDF. One
could recreate the ToUnicode table but it would take hours and probably work
only for that file.
https://stackoverflow.com/questions/39485920/how-to-add-unicode-in-truetype0font-on-pdfbox-2-0-0
> OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not
> implemented in PDFBox and will be ignored
> -----------------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-4648
> URL: https://issues.apache.org/jira/browse/PDFBOX-4648
> Project: PDFBox
> Issue Type: Improvement
> Components: Text extraction
> Affects Versions: 2.0.4
> Reporter: wanling
> Priority: Major
> Attachments: 5e214f828f164322a6600f183191dda5-Adobe.txt,
> 5e214f828f164322a6600f183191dda5-PDFBox.txt,
> 5e214f828f164322a6600f183191dda5.pdf, image-2019-09-12-08-47-32-706.png,
> image-2019-09-18-05-55-26-771.png
>
>
> No PostScript name information is provided for the font Arial-BoldMT
> OpenType Layout tables used in font ABCDEE+Times New Roman,Bold are not
> implemented in PDFBox and will be ignored
> No Unicode mapping for CID+47 (47) in font ABCDEE+Times New Roman,Bold
>
> Adobe is normal but pdfbox cann't see the _parts not all_. OCI cann‘t see
> it completely.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]