If using installed fonts is an option to save space in the file / data
stream, using embedded fonts still needs to be an option.
I am assigning specific fonts from specific files to get consistent
output so everything must be embedded.  I don't want to have to care
what is installed where.  I am glad to fix this headache I've had with
Windows 98 trying to use Courier New fonts and different PCs with the
same OS had a different font file, and trying to render on the server
versus the client having one not installed or different fonts installed
with the same name.

The problem I'm currently having with output is rendering special
unicode glyphs.  I sent one unicode as a 25AB with the font file
LTYPE.TTF which came installed with Windows XP.  In FOP 0.95 it produced
a square which is what I want.  That character is supposed to be a
square.  If I'm wrong and that character is not in the font then the
square was the default print for character not found.  I'd like to be
able to run a routine through FOP to get out a list of all unicodes and
what characters they go with for a particular font.  When I tried FOP
1.0, that same code produced a pound #.

The biggest problem I'm having running FOP 0.95 is the threading.  I've
tried calling it from a Java SwingWorker and it's not resolving the
issue.  I'm running a javax.swing.JProgressBar as indeterminate and it
freezes while I'm transforming FOP output, so the users think the
program is just stuck and I have to explain to them it's supposed to do
that the first time.  If they run it twice in a row the second one is
much smoother.

Getting smaller results is nice but not necessarily a priority.
Reducing a 2 MB file to 35 K is high priority.  Reducing a 46 K file to
35 K is not a big deal.  Getting consistent output is top priority.


-----Original Message-----
From: Jeremias Maerki [mailto:[email protected]] 
Sent: Thursday, November 11, 2010 3:35 PM
To: [email protected]
Subject: Re: TrueType Font Embedding

Hi Chris

I fully understand the desire to install the font on a PostScript
printer to keep the PS files smaller. To answer your question: I did not
ask for the business use case. The problem I'm struggling with in this
context is how to know about the CID meaning of the font, i.e. the
multi-byte encoding of the font.

When we do subsets in FOP, we re-index the glyphs starting with index 1
(or 3) by occurrence in the document. Only FOP knows which Unicode
character is represented by which CID. That's why we need the ToUnicode
CMap in PDF. Otherwise, text extraction would not be so easy.

In single-byte mode, the whole font is embedded (right now probably with
the same problems I've just fixed with rev1034094 for the TTF subset).
In this mode the Adobe character names map into the font, so 8-bit
encodings can be built to properly address the right characters even if
the font is not embedded. That's also how we currently do referenced TTF
fonts for PDF output.

If we fully embed the font as a CID font, we currently lose the
knowledge about which index represents which Unicode character.
Combining the font with a suitable CMap resolves the problem but at the
moment we only use Identity-H which is a 1:1 mapping. One solution would
be to turn the Unicode "cmap" table in the TrueType font into a custom
PS CMap and then use 16-bit Unicode characters directly. FOP currently
doesn't support that.

Also, if some PS platform allows to upload naked TrueType fonts, how
will they be represented in the PS VM? Are they CID fonts then or
single-byte fonts? If they are CID fonts, which CID system are they
following? I have no idea. The only way to be sure about this is by
installing a CID font plus CMap that is generated by FOP (which can be
done by extracting these resources from one of the PS streams. After
that, the font can be referenced, but it may not be portable to other
PS-generating applications.

And then, as Glen mentioned we have to have a strategy to deal with
glyphs with no representation in Unicode. I think I get where he goes
with that and it seems to be close to the CMap I mentioned above that is
derived from the Unicode "cmap" table in the TrueType font. At any rate,
FOP then has to learn to output Unicode characters (including private
area chars) instead of arbitrary CIDs coming from subsetting.

In the end, I'm not 100% I've understood all implications here. I hope
we'll get there soon. I guess a Wiki page would do us good here.



Jeremias Maerki

Reply via email to