If using installed fonts is an option to save space in the file / data stream, using embedded fonts still needs to be an option. I am assigning specific fonts from specific files to get consistent output so everything must be embedded. I don't want to have to care what is installed where. I am glad to fix this headache I've had with Windows 98 trying to use Courier New fonts and different PCs with the same OS had a different font file, and trying to render on the server versus the client having one not installed or different fonts installed with the same name.
The problem I'm currently having with output is rendering special unicode glyphs. I sent one unicode as a 25AB with the font file LTYPE.TTF which came installed with Windows XP. In FOP 0.95 it produced a square which is what I want. That character is supposed to be a square. If I'm wrong and that character is not in the font then the square was the default print for character not found. I'd like to be able to run a routine through FOP to get out a list of all unicodes and what characters they go with for a particular font. When I tried FOP 1.0, that same code produced a pound #. The biggest problem I'm having running FOP 0.95 is the threading. I've tried calling it from a Java SwingWorker and it's not resolving the issue. I'm running a javax.swing.JProgressBar as indeterminate and it freezes while I'm transforming FOP output, so the users think the program is just stuck and I have to explain to them it's supposed to do that the first time. If they run it twice in a row the second one is much smoother. Getting smaller results is nice but not necessarily a priority. Reducing a 2 MB file to 35 K is high priority. Reducing a 46 K file to 35 K is not a big deal. Getting consistent output is top priority. -----Original Message----- From: Jeremias Maerki [mailto:[email protected]] Sent: Thursday, November 11, 2010 3:35 PM To: [email protected] Subject: Re: TrueType Font Embedding Hi Chris I fully understand the desire to install the font on a PostScript printer to keep the PS files smaller. To answer your question: I did not ask for the business use case. The problem I'm struggling with in this context is how to know about the CID meaning of the font, i.e. the multi-byte encoding of the font. When we do subsets in FOP, we re-index the glyphs starting with index 1 (or 3) by occurrence in the document. Only FOP knows which Unicode character is represented by which CID. That's why we need the ToUnicode CMap in PDF. Otherwise, text extraction would not be so easy. In single-byte mode, the whole font is embedded (right now probably with the same problems I've just fixed with rev1034094 for the TTF subset). In this mode the Adobe character names map into the font, so 8-bit encodings can be built to properly address the right characters even if the font is not embedded. That's also how we currently do referenced TTF fonts for PDF output. If we fully embed the font as a CID font, we currently lose the knowledge about which index represents which Unicode character. Combining the font with a suitable CMap resolves the problem but at the moment we only use Identity-H which is a 1:1 mapping. One solution would be to turn the Unicode "cmap" table in the TrueType font into a custom PS CMap and then use 16-bit Unicode characters directly. FOP currently doesn't support that. Also, if some PS platform allows to upload naked TrueType fonts, how will they be represented in the PS VM? Are they CID fonts then or single-byte fonts? If they are CID fonts, which CID system are they following? I have no idea. The only way to be sure about this is by installing a CID font plus CMap that is generated by FOP (which can be done by extracting these resources from one of the PS streams. After that, the font can be referenced, but it may not be portable to other PS-generating applications. And then, as Glen mentioned we have to have a strategy to deal with glyphs with no representation in Unicode. I think I get where he goes with that and it seems to be close to the CMap I mentioned above that is derived from the Unicode "cmap" table in the TrueType font. At any rate, FOP then has to learn to output Unicode characters (including private area chars) instead of arbitrary CIDs coming from subsetting. In the end, I'm not 100% I've understood all implications here. I hope we'll get there soon. I guess a Wiki page would do us good here. Jeremias Maerki
