[ https://issues.apache.org/jira/browse/PDFBOX-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100106#comment-14100106 ]
John Hewson commented on PDFBOX-2262: ------------------------------------- Ok, I've commit my code which removes AWT font rendering. It ended up being a huge commit, because so much of the font locating, parsing, encoding, and rendering code is closely tied together. I've touched around 5000 lines of code, for this reason I've put it in its own branch [no-awt|http://svn.apache.org/viewvc/pdfbox/branches/no-awt/] for testing. There were two major unexpected aspects to this commit: - Most systems don't have a .pfb fonts so we have to use TTFs to substitute the standard 14 fonts. However, the current design of PDFBox didn't allow this, so I added the Type1Equivalent interface which works with TTF, CFF/Type1, and PFB/Type1 fonts. - CFF font handling had to be refactored to work with the new Type 1-equivalent font handling. For example, pretty much all usages of SIDs needed removing. In addition to the commit message above, here's some further details: - FontBox's NamingTable#getFontSubFamily ignored the language id so for any font where a non-English name was provided after an English name in the table, the final, non-English name was returned. Instead of "Bold" we got "Gras" which resulted in failed substitutions. - Moved all code from PDFont which used AFM metrics to PDType1Font, the only subclass which uses them. - CFFCharset was fundamentally broken, a charset should be a map from GID => SID/CID, which allows us to look up the name of a given glyph using the String INDEX which maps a SID => Name (a CID is already its own name). However Villu's charset was a map from SID/CID => Name but that's redundant and doesn't actually fulfil the role of a charset, which is to map GIDs to SIDs/CIDs. This led to some very strange "Rube Goldberg" code in CFFFont to try to determine mappings. The CFFEncoding class had a similar problem. - CFFFont was treating CID fonts as StandardEncoding, but the spec is quite clear that in this case "the default StandardEncoding should not be applied.", in fact CID fonts do not have any Encoding, because they use CIDs instead of character codes. - All of the parsing of encodings and charsets in CFFParser were off-by-one. - The entire concept of Mapping, CFFMapping and Type1Mapping has been removed, as they did not reflect how Encoding and CIDs actually work. The CFFEncoding and CFFCharset classes now handle these duties, as that's their purpose within the CFF format - this vastly simplifies working with CFF fonts. - CFFFont has been split into two subclasses, CFFType1Font for Type 1-equivalent fonts and CFFCIDFont for CID-Keyed fonts. These new classes make it impossible to mix up CIDs/SIDs/names for glyphs. - PDAfmPfbFont has been removed, replaced by PDType1FontEmbedder which creates PDType1Font instances for embedding. This removes an unnecessary subclass of PDType1Font and cleanly separates code for embedding new fonts from code for parsing existing embedded fonts. - Likewise, code for embedding new TrueType fonts has been moved to PDTrueTypeFontEmbedder to avoid what was becoming a very confusing PDTrueTypeFont class. - SystemFontManager has been removed, it only worked for TTF fonts but we needed pfb/ttf/otf support. It was easier to remove this class and replace it, firstly so that the font manager can be pluggable as in PDFBOX-2144, secondly because normalised font names were being used for lookup but this is unnecessary, the PostScript name of the font is always available and makes for simpler mappings. The new class which replaces it is FileSystemFontProvider. Likewise PDFFontManager has been replaced by the ExternalFonts class which is a service provider interface for the pluggable FontProvider. It also includes the built-in mappings for the standard 14 fonts to system fonts. By using PostScript names for font mapping the problem of substituting fonts actually becomes quite simple. This code is ready for testing, post your bugs here :) > Remove usage of AWT fonts > ------------------------- > > Key: PDFBOX-2262 > URL: https://issues.apache.org/jira/browse/PDFBOX-2262 > Project: PDFBox > Issue Type: Improvement > Components: PDModel, Rendering > Affects Versions: 2.0.0 > Reporter: John Hewson > Assignee: John Hewson > > We're still using AWT fonts to render the "standard 14" built-in fonts, which > causes rendering problems and encoding issues (see PDFBOX-2140). We're also > using AWT for some fallback fonts. > Removal of these AWT fonts isn't too difficult, we need to load the fonts > using the existing PDFFontManager mechanism which has recently been added. > All missing TrueType fonts loaded from disk have been using SystemFontManager > for a number of weeks now. > We should ship some sensible default fonts with PDFBox, such as the > Liberation fonts (see PDFBOX-2169, PDFBOX-2263), in case PDFFontManager can't > find anything suitable, rather than falling back to the default TTF font, but > by default we'll probe the system for suitable fonts. -- This message was sent by Atlassian JIRA (v6.2#6252)