On 09/11/2010 14:43, Jeremias Maerki wrote:
On 09.11.2010 14:48:30 Vincent Hennebert wrote:
There may be an interest in fully embedding a font for PostScript
output. IIUC there may be a print manager that pre-processes PostScript
files, extracts embedded fonts to store them somewhere and re-use them
whenever needed. It can then strip the font off subsequent files and
substantially lighten them, speeding up the printing process.
It makes the files smaller, but that will be the only thing that
improved printing performance. The PS interpreter still has to parse and
process the actual resource. It also needs to be noted that extracting
subset fonts doesn't make sense. I've already added the unique-ification
prefix to the TTF font names (like in PDF) to avoid problems like that.
Yes I agree extracting subset fonts doesn't make sense, but extracting a
fully embedded font does have plenty of business applications. Which is
precisely why the introduction of a setting is required here. In some
cases it is important to bring the file size down; enter the subsetting
feature. Subsetting is particularly useful when creating print stream
with a relatively small number of pages, i.e. 100 or less and you have
large Unicode fonts to support Eastern character sets.
In other situations people using FOP want to be able to create large
Print streams to send to Print Bureaus. Print Bureaus tends to use
software to parse Print streams rather than sending them directly to a
printer. Those processes will often need to be able to process the
fonts, which they can only do if the full font is embedded rather than a
subset. As you already noted above, extracting a subset if useless.
What’s the purpose of the ‘encoding’ parameter? It looks to me like
users don’t care about what encoding is used in the PDF or PostScript
file. All they want to have is properly printed documents that use their
own fonts. I think that parameter should be removed in favour of Mehdi’s
proposal, which IMO makes much more sense from a user perspective.
I don't know if it's necessary. That's why I wrote that maybe additional
research may be necessary. If we don't have it, we may have to build up
a /CIDMap that covers Unicode because there is otherwise no information
in the font which character indices correspond to which glyph as long as
we use /Registry (Adobe) /Ordering (Identity). Or: you configure a CID
map (encoding) that is tailored to the kind of document you want to
produce. The Unicode /CIDMap could result in rather big /CIDMap arrays
(65535 * 4 = 256KB) with lots of pointers to ".notdef".
From a user's perspective, the encoding parameter is too technical and
most user's ill not understand its purpose. If possible I would like to
reach a consenus on what we should do and then remove the parameter to
help cut down the complexity of configuring fonts. As you noted there
are now a bewildering number of options.
Before continuing with this there should be a broad understanding how
non-subset TrueType fonts shall be handled in PostScript (and PDF where
you can make the same case). Otherwise, a change like Mehdi proposed
doesn't improve anything.
Are you asking what the business use case is for fully embedded fonts as
opposed to subset fonts. The ability to post process is the most
important use case. If the fonts are subset it become difficult to merge
Postscript files together or extract the font. Both are fairly common at
Granted, there would be some redundancy with the referenced-fonts
element. But is the additional flexibility of regexp really useful in
the first place? I’m not too sure. Maybe that could be removed too.
I don't want that removed. I've been grateful for its existence more
than once. With the regexp I can make sure that, for example, all
variants of the "Frutiger" font are not embedded: Frutiger 45 Light,
Frutiger 55 Roman etc. etc.
I concur the regexp stuff in the font referencing is useful. We can use
it to change the way whole font families are referenced without having
to list every font.
Anyway, I don't like constantly changing the way fonts are configured.
There's enough confusion with the way it's currently done already. I
won't veto a change like that but I'm not happy with it.
I understand what you are saying there are a lot of options, but then
the requirements around fonts are complex so there is no escaping a
comlex configuration file.
On 09/11/10 12:45, Jeremias Maerki wrote:
I'm against that since we already have mechanisms to control some of
these traits and this would overlap with them. For example, we have the
which controls whether we embed or not. And we have the encoding-mode
attribute on the font element to control if single-byte or cid mode
should be used. Granted, that's not exactly what you're after, but I
believe this already covers 95% of the use cases if not more.
The only thing you can't currently do is embed a full font in CID mode
(or reference it). The problem here is the character map that should be
used when in CID mode. I think that would require some research first so
we know how best to handle this. For example, referencing only makes
sense if a TrueType font can be installed directly on the printer. But
then, the question is in which mode the characters can be addressed.
Single-byte (like we currently fall back to) is probably not a problem
unless you need to print Asian documents. Please note that we also don't
support full TTF embedding/referencing in CID mode in PDF documents. So
I'm not sure if we really need that at the moment.
If we do, I believe it would generally suffice to extend encoding-mode
from (auto|single-byte|cid) to (auto|single-byte|cid|cid-full). We may
need a "cmap" parameter then to change the default CMap (currently
"Identity-H" like in PDF) since our subsetting code uses custom mappings,
not Unicode or any other encoding scheme (like "90ms-RKSJ-H").
On 09.11.2010 12:08:36 mehdi houshmand wrote:
I'm working on making TTF subset embedding configurable such that a
user can opt for either full font embedding, subset embedding or just
referencing, this would be extending the work Jeremias submitted. I
was considering adding a parameter to the font configuration file
called "embedding" with 3 possible values "none", "subset" and "full".
This would allow the user to configure the embedding mode on a font by
font basis. What do people think about this proposal?