[
https://issues.apache.org/jira/browse/PDFBOX-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236126#comment-17236126
]
Maruan Sahyoun commented on PDFBOX-5023:
----------------------------------------
[~richardazar] text extraction should work without any changes. The easiest
would be to give it a quick try with the
[ExtractText|https://pdfbox.apache.org/2.0/commandline.html#extracttext] from
the [PDFBox
app|https://www.apache.org/dyn/closer.lua?filename=pdfbox/2.0.21/pdfbox-app-2.0.21.jar&action=download].
Compare that with text saved from Adobe Reader/Acrobat is you have that at
hand.
You could also attach a sample PDF to the ticket.
> OpenType Layout tables used in font ArabicTransparent-ARABIC are not
> implemented in PDFBox and will be ignored
> --------------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-5023
> URL: https://issues.apache.org/jira/browse/PDFBOX-5023
> Project: PDFBox
> Issue Type: Wish
> Components: FontBox, Text extraction
> Affects Versions: 2.0.8
> Reporter: Richard Azar
> Priority: Major
> Labels: fop-teaming
> Attachments: image-2020-11-20-13-34-12-306.png, log PDFbox.txt,
> sc1.PNG
>
>
> I am loading a PDF document with TrueType and TrueType CID Fonts (both within
> same document) and Only TrueType font texts are extracted usingĀ
> tStripper.getText.
> Getting the below error in logs (full logs attached)
> OpenType Layout tables used in font ArabicTransparent-ARABIC are not
> implemented in PDFBox and will be ignored.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]