[jira] [Commented] (PDFBOX-1824) [PATCH] CFF fonts render wrong glyphs

JIRA Fri, 03 Jan 2014 11:06:12 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861784#comment-13861784
 ]


Andreas Lehmkühler commented on PDFBOX-1824:
--------------------------------------------

[~jahewson] Thanks for the fast patch. I've one too, but I'm still testing to 
avoid side effects.

The remaining issue maybe related to PDFBOX-1691.

> [PATCH] CFF fonts render wrong glyphs
> -------------------------------------
>
>                 Key: PDFBOX-1824
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1824
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: John Hewson
>            Assignee: Andreas Lehmkühler
>              Labels: patch
>             Fix For: 2.0.0
>
>         Attachments: 1.patch, 2.patch, 3.patch, 
> Bimbo_Historia_20070409_Esp.pdf-2-rev-1554775.png, 
> Bimbo_Historia_20070409_Esp.pdf-2-rev-current.png, all.patch, 
> bimbo_historia-patched.jpg, bimbo_historia.patch, calluna-11.pdf, 
> patched.jpg, trunk.jpg
>
>
> I've found three very closely related CFF encoding issues in v2.0.0 when 
> using PDFToImage.
> Problem 1
> ---------
> Look a line 7 of the poem, it should be "And the mouldering dust that years 
> have made"
> but instead says "Afld the fioulderiflg dust that years have fiade"
> The CFF font is asseumed to use CIDs but it does not if its not a ROS font.
> Therefore we add a check for CFF ROS class.
> Patch 1 fixes this.
> Problem 2
> ---------
> Look at line 3 "of right shoice" should be "of right choice".
> Likewise on line 2 of the 2nd paragraph "And a staunsh" should be "And a 
> staunch",
> the st and ch ligatures are incorrect.
> This is because the font is an CFF ROS CID Font and the glyphs for the st and 
> ch ligatures
> both have no name. The CFF format achieves this by using SIDs beyond the size 
> of the string
> index, which map to .notdef. So there is a unique SID for each glyph, but not 
> a unique name.
> Unfortuntely, PDFBox assumes that Type 1 fonts have glyphs with unique names, 
> and this
> assumtion appears throughout the codebase. Because a glyph name and a SID 
> perform essentially
> the same role, I recommend a simple solution to the problem: when an SID 
> beyond the size of
> the string index is encounteted, instead of mapping it to .notdef it should 
> be mapped to 
> a new name with the prefix "SID" for example mapping SID 409 to the name 
> "SID409". That way
> each glyph will have a unique name, which is what PDFbox assumes.
> Patch 2 fixes this.
> Problem 3
> ---------
> Look at line 2, "That creepeth oÉer ruins old!" the word "o'er" is 
> incorrectly rendered
> as "oÉer". This is because the Encoding entry in the PDF maps code 201 from 
> "Eacute" in the
> base encoding to "quoteright", but this is being ignored by PDFBox.
> In the CFFGlyph2D constructor PDFBox examines the font's built-in charset. 
> When the name
> "quoteright" is encountered it is looked up in the PDF Encoding (i.e. 
> nameToCode) where
> it is changed to code 201. Thus code 201 is associated with the "quoteright" 
> glyph in the
> codeToGlyph map. This is correct. 
> However, later when the "Eacute" glyph is encountered, its built-in charset 
> code is also
> 201 (which is standard) and so the codeToGlyph map entry is overwritten, 
> resulting in
> code 201 being associated with the "Eacute" glyph. 
> The solution is to build the codeToGlyph map in a strict order: first 
> populate it with the
> font's built-in charset, then the PDF Encoding overwrites any entries which 
> it defines.
> Patch 3 fixes this (and also replaces patch 2)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (PDFBOX-1824) [PATCH] CFF fonts render wrong glyphs

Reply via email to