[
https://issues.apache.org/jira/browse/PDFBOX-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237472#comment-17237472
]
Tilman Hausherr commented on PDFBOX-5023:
-----------------------------------------
This is in the content stream. It's a command to replace an extracted item with
something else.
{code}
BT
131.129 718.38 Td
/Span << /ActualText (\376\377\000.\000,\006G\006'\006F\006J\000 ) >> BDC
(\000\003\001\223\001\212\001U\001\215\000\017\000\021) Tj
EMC
ET
{code}
here the glyphs for "\000\003\001\223\001\212\001U\001\215\000\017\000\021" is
displayed on the screen, but in text extraction, the text for
"\376\377\000.\000,\006G\006'\006F\006J\000" would have to be used.
> OpenType Layout tables used in font ArabicTransparent-ARABIC are not
> implemented in PDFBox and will be ignored
> --------------------------------------------------------------------------------------------------------------
>
> Key: PDFBOX-5023
> URL: https://issues.apache.org/jira/browse/PDFBOX-5023
> Project: PDFBox
> Issue Type: Wish
> Components: FontBox, Text extraction
> Affects Versions: 2.0.8
> Reporter: Richard Azar
> Priority: Major
> Labels: fop-teaming
> Attachments: ExtractText.txt, log PDFbox.txt, pdfsample.pdf, sc1.PNG,
> sc2.PNG, sc3.PNG
>
>
> I am loading a PDF document with TrueType and TrueType CID Fonts (both within
> same document) and Only TrueType font texts are extracted usingĀ
> tStripper.getText.
> Getting the below error in logs (full logs attached)
> OpenType Layout tables used in font ArabicTransparent-ARABIC are not
> implemented in PDFBox and will be ignored.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]