Re: TrueType Font Embedding

Vincent Hennebert Tue, 16 Nov 2010 08:49:18 -0800

Installing a font on a printer is a problem, post-processing
a PostScript file is another one.


It is indeed an issue to determine how to reference a font that has been
manually installed on a printer. I tried once to install a TrueType font
on a Xerox printer, and the Xerox utility I used for that tried to
convert it. Into what? No idea.

I tried to reference the Kochi Gothic font manually installed on a HP
printer and using the PostScript name (Kochi-Gothic) didn’t work.
Printing the font list was giving ‘Kochi Gothic’ with the space in
between and AFAIK it’s not possible to use a space in a font name in
PostScript.

I tried to reference an ornaments font installed on the Xerox printer,
using the TrueType file provided with the printer to get the metrics.
I got it working be deriving a font with a custom encoding. I don’t know
wether the actual font on the printer was in Type 1 or TrueType format.
The method of deriving a custom encoding would have been the same
anyway. Maybe it was even some proprietary format.

So, it’s difficult to know whether a font that is manually installed on
a printer will be converted or not, accessible as a single-byte font or
a CIDFont, etc. And each make is likely to do it differently.

However, AFAIU from Chris, there still is an interest to fully embed
a font to allow post-processing by a print bureau. For example,
concatenating several FOP-produced documents into a single big print
job. In that case we don’t care about the printer. Everything remains in
the control of FOP. It’s up to us whether we want to use base fonts or
CID-keyed fonts. And I don’t think the user even wants to know how we do
it, as long as they have the option to either fully embed, or
subset-embed the font.


Vincent


On 11/11/10 20:35, Jeremias Maerki wrote:
> Hi Chris
> 
> I fully understand the desire to install the font on a PostScript
> printer to keep the PS files smaller. To answer your question: I did not
> ask for the business use case. The problem I'm struggling with in this
> context is how to know about the CID meaning of the font, i.e. the
> multi-byte encoding of the font.
> 
> When we do subsets in FOP, we re-index the glyphs starting with index 1
> (or 3) by occurrence in the document. Only FOP knows which Unicode
> character is represented by which CID. That's why we need the ToUnicode
> CMap in PDF. Otherwise, text extraction would not be so easy.
> 
> In single-byte mode, the whole font is embedded (right now probably with
> the same problems I've just fixed with rev1034094 for the TTF subset).
> In this mode the Adobe character names map into the font, so 8-bit
> encodings can be built to properly address the right characters even if
> the font is not embedded. That's also how we currently do referenced TTF
> fonts for PDF output.
> 
> If we fully embed the font as a CID font, we currently lose the
> knowledge about which index represents which Unicode character.
> Combining the font with a suitable CMap resolves the problem but at the
> moment we only use Identity-H which is a 1:1 mapping. One solution would
> be to turn the Unicode "cmap" table in the TrueType font into a custom PS
> CMap and then use 16-bit Unicode characters directly. FOP currently
> doesn't support that.
> 
> Also, if some PS platform allows to upload naked TrueType fonts, how
> will they be represented in the PS VM? Are they CID fonts then or
> single-byte fonts? If they are CID fonts, which CID system are they
> following? I have no idea. The only way to be sure about this is by
> installing a CID font plus CMap that is generated by FOP (which can be
> done by extracting these resources from one of the PS streams. After
> that, the font can be referenced, but it may not be portable to other
> PS-generating applications.
> 
> And then, as Glen mentioned we have to have a strategy to deal with
> glyphs with no representation in Unicode. I think I get where he goes
> with that and it seems to be close to the CMap I mentioned above that is
> derived from the Unicode "cmap" table in the TrueType font. At any rate,
> FOP then has to learn to output Unicode characters (including private
> area chars) instead of arbitrary CIDs coming from subsetting.
> 
> In the end, I'm not 100% I've understood all implications here. I hope
> we'll get there soon. I guess a Wiki page would do us good here.
> 
> On 11.11.2010 17:50:46 Chris Bowditch wrote:
>> Hi All,
>>
>> On 09/11/2010 14:43, Jeremias Maerki wrote:
>>> On 09.11.2010 14:48:30 Vincent Hennebert wrote:
>>>> There may be an interest in fully embedding a font for PostScript
>>>> output. IIUC there may be a print manager that pre-processes PostScript
>>>> files, extracts embedded fonts to store them somewhere and re-use them
>>>> whenever needed. It can then strip the font off subsequent files and
>>>> substantially lighten them, speeding up the printing process.
>>> It makes the files smaller, but that will be the only thing that
>>> improved printing performance. The PS interpreter still has to parse and
>>> process the actual resource. It also needs to be noted that extracting
>>> subset fonts doesn't make sense. I've already added the unique-ification
>>> prefix to the TTF font names (like in PDF) to avoid problems like that.
>>
>> Yes I agree extracting subset fonts doesn't make sense, but extracting a 
>> fully embedded font does have plenty of business applications. Which is 
>> precisely why the introduction of a setting is required here. In some 
>> cases it is important to bring the file size down; enter the subsetting 
>> feature. Subsetting is particularly useful when creating print stream 
>> with a relatively small number of pages, i.e. 100 or less and you have 
>> large Unicode fonts to support Eastern character sets.
>>
>>   In other situations people using FOP want to be able to create large 
>> Print streams to send to Print Bureaus. Print Bureaus tends to use 
>> software to parse Print streams rather than sending them directly to a 
>> printer. Those processes will often need to be able to process the 
>> fonts, which they can only do if the full font is embedded rather than a 
>> subset. As you already noted above, extracting a subset if useless.
>>
>>>> What’s the purpose of the ‘encoding’ parameter? It looks to me like
>>>> users don’t care about what encoding is used in the PDF or PostScript
>>>> file. All they want to have is properly printed documents that use their
>>>> own fonts. I think that parameter should be removed in favour of Mehdi’s
>>>> proposal, which IMO makes much more sense from a user perspective.
>>> I don't know if it's necessary. That's why I wrote that maybe additional
>>> research may be necessary. If we don't have it, we may have to build up
>>> a /CIDMap that covers Unicode because there is otherwise no information
>>> in the font which character indices correspond to which glyph as long as
>>> we use /Registry (Adobe) /Ordering (Identity). Or: you configure a CID
>>> map (encoding) that is tailored to the kind of document you want to
>>> produce. The Unicode /CIDMap could result in rather big /CIDMap arrays
>>> (65535 * 4 = 256KB) with lots of pointers to ".notdef".
>>
>>  From a user's perspective, the encoding parameter is too technical and 
>> most user's ill not understand its purpose. If possible I would like to 
>> reach a consenus on what we should do and then remove the parameter to 
>> help cut down the complexity of configuring fonts. As you noted there 
>> are now a bewildering number of options.
>>
>>> Before continuing with this there should be a broad understanding how
>>> non-subset TrueType fonts shall be handled in PostScript (and PDF where
>>> you can make the same case). Otherwise, a change like Mehdi proposed
>>> doesn't improve anything.
>>
>> Are you asking what the business use case is for fully embedded fonts as 
>> opposed to subset fonts. The ability to post process is the most 
>> important use case. If the fonts are subset it become difficult to merge 
>> Postscript files together or extract the font. Both are fairly common at 
>> Print bureaus.
>>>> Granted, there would be some redundancy with the referenced-fonts
>>>> element. But is the additional flexibility of regexp really useful in
>>>> the first place? I’m not too sure. Maybe that could be removed too.
>>> I don't want that removed. I've been grateful for its existence more
>>> than once. With the regexp I can make sure that, for example, all
>>> variants of the "Frutiger" font are not embedded: Frutiger 45 Light,
>>> Frutiger 55 Roman etc. etc.
>>
>> I concur the regexp stuff in the font referencing is useful. We can use 
>> it to change the way whole font families are referenced without having 
>> to list every font.
>>> Anyway, I don't like constantly changing the way fonts are configured.
>>> There's enough confusion with the way it's currently done already. I
>>> won't veto a change like that but I'm not happy with it.
>>
>> I understand what you are saying there are a lot of options, but then 
>> the requirements around fonts are complex so there is no escaping a 
>> comlex configuration file.
>>
>> Thanks,
>>
>> Chris
>>
>>>> Vincent
>>>>
>>>>
>>>> On 09/11/10 12:45, Jeremias Maerki wrote:
>>>>> Hi Mehdi,
>>>>> I'm against that since we already have mechanisms to control some of
>>>>> these traits and this would overlap with them. For example, we have the
>>>>> referenced-fonts element
>>>>> (http://xmlgraphics.apache.org/fop/trunk/fonts.html#embedding)
>>>>> which controls whether we embed or not. And we have the encoding-mode
>>>>> attribute on the font element to control if single-byte or cid mode
>>>>> should be used. Granted, that's not exactly what you're after, but I
>>>>> believe this already covers 95% of the use cases if not more.
>>>>>
>>>>> The only thing you can't currently do is embed a full font in CID mode
>>>>> (or reference it). The problem here is the character map that should be
>>>>> used when in CID mode. I think that would require some research first so
>>>>> we know how best to handle this. For example, referencing only makes
>>>>> sense if a TrueType font can be installed directly on the printer. But
>>>>> then, the question is in which mode the characters can be addressed.
>>>>> Single-byte (like we currently fall back to) is probably not a problem
>>>>> unless you need to print Asian documents. Please note that we also don't
>>>>> support full TTF embedding/referencing in CID mode in PDF documents. So
>>>>> I'm not sure if we really need that at the moment.
>>>>>
>>>>> If we do, I believe it would generally suffice to extend encoding-mode
>>>>> from (auto|single-byte|cid) to (auto|single-byte|cid|cid-full). We may
>>>>> need a "cmap" parameter then to change the default CMap (currently
>>>>> "Identity-H" like in PDF) since our subsetting code uses custom mappings,
>>>>> not Unicode or any other encoding scheme (like "90ms-RKSJ-H").
>>>>>
>>>>> On 09.11.2010 12:08:36 mehdi houshmand wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm working on making TTF subset embedding configurable such that a
>>>>>> user can opt for either full font embedding, subset embedding or just
>>>>>> referencing, this would be extending the work Jeremias submitted. I
>>>>>> was considering adding a parameter to the font configuration file
>>>>>> called "embedding" with 3 possible values "none", "subset" and "full".
>>>>>> This would allow the user to configure the embedding mode on a font by
>>>>>> font basis. What do people think about this proposal?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Mehdi
>>>>>
>>>>>
>>>>>
>>>>> Jeremias Maerki
>>>>>
>>>
>>>
>>>
>>> Jeremias Maerki
>>>
>>>
>>>
> 
> 
> 
> 
> Jeremias Maerki
>

Re: TrueType Font Embedding

Reply via email to