[ 
https://issues.apache.org/jira/browse/PDFBOX-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated PDFBOX-1129:
---------------------------------------

    Attachment: 000086.pdf

PDF document showing the issue.
                
> Quote glyphs (quoteright, quotedblright, etc.) not mapped to the right 
> Unicode character
> ----------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1129
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1129
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.7.0
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: 000086.pdf
>
>
> I have an example PDF (will attach) that uses a right-single-quote
> character, but extracts incorrectly from PDFBox (using ExtractText).
> If I copy/paste, the text is correct (I get U+2019 for the right
> quote).
> Search for "cashier" in the PDF, on page 1 to see it; that right quote
> is supposed to come through as U+2019 I think.
> I looked at the PDF in PDFDebugger, and I see this fragment in the
> "Contents" for page 1:
>   (Bring the voucher handout to the cashier\325s office \(10-180\))Tj
> So somehow this \325 escape fails to map to the quoteright glyph.  The
> font is partial embedded font BPOLKO+TimesNewRomanPSMT, and I can see
> in the Charset (under FontDescriptor, for font F1) that it references
> this glyph.
> I also see a [correct] entry in glyphlist.txt, mapping to U+2019, so
> that's not the problem.
> Not sure what's going wrong... maybe somehow \325 fails to map to
> quoteright? 
> There are other glyphs (quotedblright, quotedblleft) that are also not
> converted correctly, eg search for project review on page 2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to