[
https://issues.apache.org/jira/browse/PDFBOX-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813833#comment-13813833
]
Tilman Hausherr edited comment on PDFBOX-1770 at 11/5/13 12:19 PM:
-------------------------------------------------------------------
What do you mean "But I don't why excute"?
1) Anyway, I took a quick look and I think I understand what you mean.
PDFStreamEngine.java:380 is weird. From a string with "14" it makes a
non-existing unicode character 12596.
2) I also noticed: in PDType1CFont.java, in load(), the variable codeToNameMap
is filled but then never used. However mapping.getCode() (which would have
exactly the char values used in the PDF) is put there and not used. There
should be some logic that maps from mapping.getCode() to mapping.getSID() and I
suspect that it involves codeToNameMap. A perfect result would be that 49
somehow leads to 396, because 396 does exist in codetoGlyph and maps to the
correct glyph 0.
3) CFFGlyph2D has the same problem in the constructor as described in 2).
Adding codeToGlyph.put(mapping.getCode(), glyphId) would bring up a perfectly
rendered image, while the existing code brings only a lot of "?".
was (Author: tilman):
What do you mean "But I don't why excute"?
Anyway, I took a quick look and I think I understand what you mean.
PDFStreamEngine.java:380 is weird. From a string with "14" it makes a
non-existing unicode character 12596.
I also noticed: in PDType1CFont.java, in load(), the variable codeToNameMap is
filled but then never used. However mapping.getCode() (which would have exactly
the char values used in the PDF) is put there and not used. There should be
some logic that maps from mapping.getCode() to mapping.getSID() and I suspect
that it involves codeToNameMap. A perfect result would be that 49 somehow leads
to 396, because 396 does exist in codetoGlyph and maps to the correct glyph 0.
> ExtractText gets all "?" when pdf 's font is instance of PDType1Font
> --------------------------------------------------------------------
>
> Key: PDFBOX-1770
> URL: https://issues.apache.org/jira/browse/PDFBOX-1770
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.8.2
> Reporter: Sean.Sun
> Attachments: The Importance of Symmetry.pdf, The Importance of
> Symmetry.pdf.d2t
>
>
> ExtractText gets all "?" when font is instanceof PDType1Font and subtype is
> type1CFont and fontEncoding is null.
--
This message was sent by Atlassian JIRA
(v6.1#6144)