[ 
https://issues.apache.org/jira/browse/PDFBOX-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813833#comment-13813833
 ] 

Tilman Hausherr commented on PDFBOX-1770:
-----------------------------------------

What do you mean "But I don't why excute"?

Anyway, I took a quick look and I think I understand what you mean. 
PDFStreamEngine.java:380 is weird. From a string with "14" it makes a 
non-existing unicode character 12596.

I also noticed: in PDType1CFont.java, in load(), the variable codeToNameMap is 
filled but then never used. However mapping.getCode() (which would have exactly 
the char values used in the PDF) is put there and not used. There should be 
some logic that maps from mapping.getCode() to mapping.getSID() and I suspect 
that it involves codeToNameMap. A perfect result would be that 49 somehow leads 
to 396, because 396 does exist in codetoGlyph and maps to the correct glyph 0.

> ExtractText gets all "?" when pdf 's font is instance of PDType1Font
> --------------------------------------------------------------------
>
>                 Key: PDFBOX-1770
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1770
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.2
>            Reporter: Sean.Sun
>         Attachments: The Importance of Symmetry.pdf, The Importance of 
> Symmetry.pdf.d2t
>
>
> ExtractText gets all "?" when font is instanceof PDType1Font and subtype is 
> type1CFont and fontEncoding is null.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to