[jira] [Commented] (PDFBOX-2262) Remove usage of AWT fonts

John Hewson (JIRA) Sun, 17 Aug 2014 13:57:40 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100106#comment-14100106
 ]


John Hewson commented on PDFBOX-2262:
-------------------------------------

Ok, I've commit my code which removes AWT font rendering. It ended up being a 
huge commit, because so much of the font locating, parsing, encoding, and 
rendering code is closely tied together. I've touched around 5000 lines of 
code, for this reason I've put it in its own branch 
[no-awt|http://svn.apache.org/viewvc/pdfbox/branches/no-awt/] for testing.

There were two major unexpected aspects to this commit:
- Most systems don't have a .pfb fonts so we have to use TTFs to substitute the 
standard 14 fonts. However, the current design of PDFBox didn't allow this, so 
I added the Type1Equivalent interface which works with TTF, CFF/Type1, and 
PFB/Type1 fonts.
- CFF font handling had to be refactored to work with the new Type 1-equivalent 
font handling. For example, pretty much all usages of SIDs needed removing.

In addition to the commit message above, here's some further details:

- FontBox's NamingTable#getFontSubFamily ignored the language id so for any 
font where a non-English name was provided after an English name in the table, 
the final, non-English name was returned. Instead of "Bold" we got "Gras" which 
resulted in failed substitutions.
- Moved all code from PDFont which used AFM metrics to PDType1Font, the only 
subclass which uses them.
- CFFCharset was fundamentally broken, a charset should be a map from GID => 
SID/CID, which allows us to look up the name of a given glyph using the String 
INDEX which maps a SID => Name (a CID is already its own name). However Villu's 
charset was a map from SID/CID => Name but that's redundant and doesn't 
actually fulfil the role of a charset, which is to map GIDs to SIDs/CIDs. This 
led to some very strange "Rube Goldberg" code in CFFFont to try to determine 
mappings. The CFFEncoding class had a similar problem.
- CFFFont was treating CID fonts as StandardEncoding, but the spec is quite 
clear that in this case "the default StandardEncoding should not be applied.", 
in fact CID fonts do not have any Encoding, because they use CIDs instead of 
character codes.
- All of the parsing of encodings and charsets in CFFParser were off-by-one.
- The entire concept of Mapping, CFFMapping and Type1Mapping has been removed, 
as they did not reflect how Encoding and CIDs actually work. The CFFEncoding 
and CFFCharset classes now handle these duties, as that's their purpose within 
the CFF format - this vastly simplifies working with CFF fonts.
- CFFFont has been split into two subclasses, CFFType1Font for Type 
1-equivalent fonts and CFFCIDFont for CID-Keyed fonts. These new classes make 
it impossible to mix up CIDs/SIDs/names for glyphs.
- PDAfmPfbFont has been removed, replaced by PDType1FontEmbedder which creates 
PDType1Font instances for embedding. This removes an unnecessary subclass of 
PDType1Font and cleanly separates code for embedding new fonts from code for 
parsing existing embedded fonts.
- Likewise, code for embedding new TrueType fonts has been moved to 
PDTrueTypeFontEmbedder to avoid what was becoming a very confusing 
PDTrueTypeFont class.
- SystemFontManager has been removed, it only worked for TTF fonts but we 
needed pfb/ttf/otf support. It was easier to remove this class and replace it, 
firstly so that the font manager can be pluggable as in PDFBOX-2144, secondly 
because normalised font names were being used for lookup but this is 
unnecessary, the PostScript name of the font is always available and makes for 
simpler mappings.  The new class which replaces it is FileSystemFontProvider. 
Likewise PDFFontManager has been replaced by the ExternalFonts class which is a 
service provider interface for the pluggable FontProvider. It also includes the 
built-in mappings for the standard 14 fonts to system fonts. By using 
PostScript names for font mapping the problem of substituting fonts actually 
becomes quite simple.

This code is ready for testing, post your bugs here :)

> Remove usage of AWT fonts
> -------------------------
>
>                 Key: PDFBOX-2262
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2262
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: PDModel, Rendering
>    Affects Versions: 2.0.0
>            Reporter: John Hewson
>            Assignee: John Hewson
>
> We're still using AWT fonts to render the "standard 14" built-in fonts, which 
> causes rendering problems and encoding issues (see  PDFBOX-2140). We're also 
> using AWT for some fallback fonts.
> Removal of these AWT fonts isn't too difficult, we need to load the fonts 
> using the existing PDFFontManager mechanism which has recently been added. 
> All missing TrueType fonts loaded from disk have been using SystemFontManager 
> for a number of weeks now. 
> We should ship some sensible default fonts with PDFBox, such as the 
> Liberation fonts (see PDFBOX-2169, PDFBOX-2263), in case PDFFontManager can't 
> find anything suitable, rather than falling back to the default TTF font, but 
> by default we'll probe the system for suitable fonts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PDFBOX-2262) Remove usage of AWT fonts

Reply via email to