[
https://issues.apache.org/jira/browse/PDFBOX-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr closed PDFBOX-3962.
-----------------------------------
Resolution: Won't Fix
> No unicode mapping / Text not extracting
> ----------------------------------------
>
> Key: PDFBOX-3962
> URL: https://issues.apache.org/jira/browse/PDFBOX-3962
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Reporter: Roman
> Attachments: 72083_qdf.pdf
>
>
> From the attached [^72083_qdf.pdf] file, this text (big letters on the top)
> is not extracted using PDFTextStripper:
> {code}
> AGGIE NIGHT
> AT ENRON FIELD
> FRIDAY, JUNE 15, 2001 at 7:05
> HOUSTON ASTROS VS. TEXAS RANGERS
> {code}
> It does not work well in Acrobat Reader also. But, at the same time, it can
> be extracted properly by some PDF viewers.
> Also, I found a workaround how to make it work, see it below.
> 1. Find this code block in *LegacyPDFStreamEngine.java*
> {code}
> if(unicode == null) {
> if(!(font instanceof PDSimpleFont)) {
> return;
> }
> char c = (char)code;
> unicode = new String(new char[]{c});
> }
> {code}
> 2. Insert this code block just before found one.
> {code}
> if (unicode == null) {
> if (font instanceof PDType1CFont) {
> String name = ((PDType1CFont) font).codeToName(code);
> try {
> Method method =
> PDType1CFont.class.getDeclaredMethod("readEncodingFromFont");
> method.setAccessible(true);
> Encoding encoding = (Encoding) method.invoke(font);
> Integer newCode = encoding.getNameToCodeMap().get(name);
> if (newCode != null && newCode.intValue() != 0) {
> unicode = new String(new char[]{(char)
> newCode.byteValue()});
> }
> } catch (NoSuchMethodException e) {
> e.printStackTrace();
> } catch (IllegalAccessException e) {
> e.printStackTrace();
> } catch (InvocationTargetException e) {
> e.printStackTrace();
> }
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]