[ 
https://issues.apache.org/jira/browse/PDFBOX-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813833#comment-13813833
 ] 

Tilman Hausherr edited comment on PDFBOX-1770 at 11/5/13 12:19 PM:
-------------------------------------------------------------------

What do you mean "But I don't why excute"?

1) Anyway, I took a quick look and I think I understand what you mean. 
PDFStreamEngine.java:380 is weird. From a string with "14" it makes a 
non-existing unicode character 12596.

2) I also noticed: in PDType1CFont.java, in load(), the variable codeToNameMap 
is filled but then never used. However mapping.getCode() (which would have 
exactly the char values used in the PDF) is put there and not used. There 
should be some logic that maps from mapping.getCode() to mapping.getSID() and I 
suspect that it involves codeToNameMap. A perfect result would be that 49 
somehow leads to 396, because 396 does exist in codetoGlyph and maps to the 
correct glyph 0.

3) CFFGlyph2D has the same problem in the constructor as described in 2). 
Adding codeToGlyph.put(mapping.getCode(), glyphId) would bring up a perfectly 
rendered image, while the existing code brings only a lot of "?".


was (Author: tilman):
What do you mean "But I don't why excute"?

Anyway, I took a quick look and I think I understand what you mean. 
PDFStreamEngine.java:380 is weird. From a string with "14" it makes a 
non-existing unicode character 12596.

I also noticed: in PDType1CFont.java, in load(), the variable codeToNameMap is 
filled but then never used. However mapping.getCode() (which would have exactly 
the char values used in the PDF) is put there and not used. There should be 
some logic that maps from mapping.getCode() to mapping.getSID() and I suspect 
that it involves codeToNameMap. A perfect result would be that 49 somehow leads 
to 396, because 396 does exist in codetoGlyph and maps to the correct glyph 0.

> ExtractText gets all "?" when pdf 's font is instance of PDType1Font
> --------------------------------------------------------------------
>
>                 Key: PDFBOX-1770
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1770
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.2
>            Reporter: Sean.Sun
>         Attachments: The Importance of Symmetry.pdf, The Importance of 
> Symmetry.pdf.d2t
>
>
> ExtractText gets all "?" when font is instanceof PDType1Font and subtype is 
> type1CFont and fontEncoding is null.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to