TrueType Font Embedding
Hi, I'm working on making TTF subset embedding configurable such that a user can opt for either full font embedding, subset embedding or just referencing, this would be extending the work Jeremias submitted. I was considering adding a parameter to the font configuration file called embedding with 3 possible values none, subset and full. This would allow the user to configure the embedding mode on a font by font basis. What do people think about this proposal? Thanks Mehdi
Re: TrueType Font Embedding
There may be an interest in fully embedding a font for PostScript output. IIUC there may be a print manager that pre-processes PostScript files, extracts embedded fonts to store them somewhere and re-use them whenever needed. It can then strip the font off subsequent files and substantially lighten them, speeding up the printing process. What’s the purpose of the ‘encoding’ parameter? It looks to me like users don’t care about what encoding is used in the PDF or PostScript file. All they want to have is properly printed documents that use their own fonts. I think that parameter should be removed in favour of Mehdi’s proposal, which IMO makes much more sense from a user perspective. Granted, there would be some redundancy with the referenced-fonts element. But is the additional flexibility of regexp really useful in the first place? I’m not too sure. Maybe that could be removed too. Vincent On 09/11/10 12:45, Jeremias Maerki wrote: Hi Mehdi, I'm against that since we already have mechanisms to control some of these traits and this would overlap with them. For example, we have the referenced-fonts element (http://xmlgraphics.apache.org/fop/trunk/fonts.html#embedding) which controls whether we embed or not. And we have the encoding-mode attribute on the font element to control if single-byte or cid mode should be used. Granted, that's not exactly what you're after, but I believe this already covers 95% of the use cases if not more. The only thing you can't currently do is embed a full font in CID mode (or reference it). The problem here is the character map that should be used when in CID mode. I think that would require some research first so we know how best to handle this. For example, referencing only makes sense if a TrueType font can be installed directly on the printer. But then, the question is in which mode the characters can be addressed. Single-byte (like we currently fall back to) is probably not a problem unless you need to print Asian documents. Please note that we also don't support full TTF embedding/referencing in CID mode in PDF documents. So I'm not sure if we really need that at the moment. If we do, I believe it would generally suffice to extend encoding-mode from (auto|single-byte|cid) to (auto|single-byte|cid|cid-full). We may need a cmap parameter then to change the default CMap (currently Identity-H like in PDF) since our subsetting code uses custom mappings, not Unicode or any other encoding scheme (like 90ms-RKSJ-H). On 09.11.2010 12:08:36 mehdi houshmand wrote: Hi, I'm working on making TTF subset embedding configurable such that a user can opt for either full font embedding, subset embedding or just referencing, this would be extending the work Jeremias submitted. I was considering adding a parameter to the font configuration file called embedding with 3 possible values none, subset and full. This would allow the user to configure the embedding mode on a font by font basis. What do people think about this proposal? Thanks Mehdi Jeremias Maerki
Re: TrueType Font Embedding
Hi Mehdi, I'm against that since we already have mechanisms to control some of these traits and this would overlap with them. For example, we have the referenced-fonts element (http://xmlgraphics.apache.org/fop/trunk/fonts.html#embedding) which controls whether we embed or not. And we have the encoding-mode attribute on the font element to control if single-byte or cid mode should be used. Granted, that's not exactly what you're after, but I believe this already covers 95% of the use cases if not more. The only thing you can't currently do is embed a full font in CID mode (or reference it). The problem here is the character map that should be used when in CID mode. I think that would require some research first so we know how best to handle this. For example, referencing only makes sense if a TrueType font can be installed directly on the printer. But then, the question is in which mode the characters can be addressed. Single-byte (like we currently fall back to) is probably not a problem unless you need to print Asian documents. Please note that we also don't support full TTF embedding/referencing in CID mode in PDF documents. So I'm not sure if we really need that at the moment. If we do, I believe it would generally suffice to extend encoding-mode from (auto|single-byte|cid) to (auto|single-byte|cid|cid-full). We may need a cmap parameter then to change the default CMap (currently Identity-H like in PDF) since our subsetting code uses custom mappings, not Unicode or any other encoding scheme (like 90ms-RKSJ-H). On 09.11.2010 12:08:36 mehdi houshmand wrote: Hi, I'm working on making TTF subset embedding configurable such that a user can opt for either full font embedding, subset embedding or just referencing, this would be extending the work Jeremias submitted. I was considering adding a parameter to the font configuration file called embedding with 3 possible values none, subset and full. This would allow the user to configure the embedding mode on a font by font basis. What do people think about this proposal? Thanks Mehdi Jeremias Maerki
Re: TrueType Font Embedding
On 09.11.2010 14:48:30 Vincent Hennebert wrote: There may be an interest in fully embedding a font for PostScript output. IIUC there may be a print manager that pre-processes PostScript files, extracts embedded fonts to store them somewhere and re-use them whenever needed. It can then strip the font off subsequent files and substantially lighten them, speeding up the printing process. It makes the files smaller, but that will be the only thing that improved printing performance. The PS interpreter still has to parse and process the actual resource. It also needs to be noted that extracting subset fonts doesn't make sense. I've already added the unique-ification prefix to the TTF font names (like in PDF) to avoid problems like that. What’s the purpose of the ‘encoding’ parameter? It looks to me like users don’t care about what encoding is used in the PDF or PostScript file. All they want to have is properly printed documents that use their own fonts. I think that parameter should be removed in favour of Mehdi’s proposal, which IMO makes much more sense from a user perspective. I don't know if it's necessary. That's why I wrote that maybe additional research may be necessary. If we don't have it, we may have to build up a /CIDMap that covers Unicode because there is otherwise no information in the font which character indices correspond to which glyph as long as we use /Registry (Adobe) /Ordering (Identity). Or: you configure a CID map (encoding) that is tailored to the kind of document you want to produce. The Unicode /CIDMap could result in rather big /CIDMap arrays (65535 * 4 = 256KB) with lots of pointers to .notdef. Before continuing with this there should be a broad understanding how non-subset TrueType fonts shall be handled in PostScript (and PDF where you can make the same case). Otherwise, a change like Mehdi proposed doesn't improve anything. Granted, there would be some redundancy with the referenced-fonts element. But is the additional flexibility of regexp really useful in the first place? I’m not too sure. Maybe that could be removed too. I don't want that removed. I've been grateful for its existence more than once. With the regexp I can make sure that, for example, all variants of the Frutiger font are not embedded: Frutiger 45 Light, Frutiger 55 Roman etc. etc. Anyway, I don't like constantly changing the way fonts are configured. There's enough confusion with the way it's currently done already. I won't veto a change like that but I'm not happy with it. Vincent On 09/11/10 12:45, Jeremias Maerki wrote: Hi Mehdi, I'm against that since we already have mechanisms to control some of these traits and this would overlap with them. For example, we have the referenced-fonts element (http://xmlgraphics.apache.org/fop/trunk/fonts.html#embedding) which controls whether we embed or not. And we have the encoding-mode attribute on the font element to control if single-byte or cid mode should be used. Granted, that's not exactly what you're after, but I believe this already covers 95% of the use cases if not more. The only thing you can't currently do is embed a full font in CID mode (or reference it). The problem here is the character map that should be used when in CID mode. I think that would require some research first so we know how best to handle this. For example, referencing only makes sense if a TrueType font can be installed directly on the printer. But then, the question is in which mode the characters can be addressed. Single-byte (like we currently fall back to) is probably not a problem unless you need to print Asian documents. Please note that we also don't support full TTF embedding/referencing in CID mode in PDF documents. So I'm not sure if we really need that at the moment. If we do, I believe it would generally suffice to extend encoding-mode from (auto|single-byte|cid) to (auto|single-byte|cid|cid-full). We may need a cmap parameter then to change the default CMap (currently Identity-H like in PDF) since our subsetting code uses custom mappings, not Unicode or any other encoding scheme (like 90ms-RKSJ-H). On 09.11.2010 12:08:36 mehdi houshmand wrote: Hi, I'm working on making TTF subset embedding configurable such that a user can opt for either full font embedding, subset embedding or just referencing, this would be extending the work Jeremias submitted. I was considering adding a parameter to the font configuration file called embedding with 3 possible values none, subset and full. This would allow the user to configure the embedding mode on a font by font basis. What do people think about this proposal? Thanks Mehdi Jeremias Maerki Jeremias Maerki
DO NOT REPLY [Bug 50240] [PATCH] Upgrade to Java 1.5 - Converted EncodingMode to an Enum
https://issues.apache.org/bugzilla/show_bug.cgi?id=50240 --- Comment #1 from Mehdi Houshmand med1...@gmail.com 2010-11-09 11:39:46 EST --- Created an attachment (id=26273) -- (https://issues.apache.org/bugzilla/attachment.cgi?id=26273) EncodingMode changed from class - enum + JUnit test Sorry I failed to add the JUnit test -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
DO NOT REPLY [Bug 50240] New: [PATCH] Upgrade to Java 1.5 - Converted EncodingMode to an Enum
https://issues.apache.org/bugzilla/show_bug.cgi?id=50240 Summary: [PATCH] Upgrade to Java 1.5 - Converted EncodingMode to an Enum Product: Fop Version: all Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P2 Component: fonts AssignedTo: fop-dev@xmlgraphics.apache.org ReportedBy: med1...@gmail.com Created an attachment (id=26272) -- (https://issues.apache.org/bugzilla/attachment.cgi?id=26272) EncodingMode changed from class - enum In light of the recent vote to convert to Java5, I have changed o.a.f.fonts.EncodingMode.java to an enumerated type. I've also added a JUnit test for this enum. I have tested this enum with various different fonts and diffed them to ensure there are no differences. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
Re: DO NOT REPLY [Bug 50240] New: [PATCH] Upgrade to Java 1.5 - Converted EncodingMode to an Enum
You may find some of the code in the now-stagnant SourceForge Folio project useful. It has lots of enums defined, for example. It is MPL, but just let me know and I'll change the licence. Peter West He said to them, Come and see. On 10/11/2010, at 2:35 AM, bugzi...@apache.org wrote: https://issues.apache.org/bugzilla/show_bug.cgi?id=50240 Summary: [PATCH] Upgrade to Java 1.5 - Converted EncodingMode to an Enum Product: Fop Version: all Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P2 Component: fonts AssignedTo: fop-dev@xmlgraphics.apache.org ReportedBy: med1...@gmail.com Created an attachment (id=26272) -- (https://issues.apache.org/bugzilla/attachment.cgi?id=26272) EncodingMode changed from class - enum In light of the recent vote to convert to Java5, I have changed o.a.f.fonts.EncodingMode.java to an enumerated type. I've also added a JUnit test for this enum. I have tested this enum with various different fonts and diffed them to ensure there are no differences. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.