It turns out that implementing solution 1 isn't so easy in PDF. Actually, my naming was even wrong. It's not a CIDFont I wanted to use. I just wanted to use character codes larger than 255 as input to a CMap which spits out character names for Type 1 fonts (see illustration in PS Third Edition, page 367). That's easy in PostScript but not supported in PDF 1.4. Grmbl.
If I could somehow convert a Type 1 font into a CIDFont I could do it. The clue to using a CIDFont is to use CID (character IDs) instead of character names. But I think this conversion is probably more complicated than implementing solution 2. So I'm off towards solution 2. On 12.02.2008 15:25:21 Jeremias Maerki wrote: > I've been asked to look into the possibility to support unusual > encodings (like Cyrillic) with Type 1 fonts. Right now we only support > WinAnsiEncoding (plus special handling for Symbol and ZapfDingbats). > > I already have an AFM parser. The AFM parser is the precondition to > safely support non-standard encodings as only this file contains the > glyph list of a font. > > I'm now on a good way to support non-WinAnsi encodings since I can now > build CodePointMapping instances from an AFM file. I then have to teach > the PDF and PS renderers to make use of these special encodings. > > That's step 1, but it will only make the font's native encoding > available in FOP. The number of available glyphs for a Type 1 font will > still remain under 255 (typicaly under 223 as the first 32 chars are > usually not used). To support all glyphs of a Type 1 font we need more > and I found two possible ways to pursue: > > 1. Treat Type 1 fonts as CID fonts. > > + Probably the cleaner approach. > + All glyphs are supported under one single font (no font renderer-level > font switching required, see below) > - Makes the generated PDF/PS code a little less readable but that's not > important. > > 2. Do something like OpenOffice when handling fonts with more than 255 > chars: Create multiple single-byte encodings which map to the same base > font. This will require an 1:n relationship from font to char mapping > which the renderers also have to handle. The first encoding will be > equal to the font's default encoding (PDF calls that the "implicit base > encoding"). The other encoding(s) will be built from the rest of the > available characters. In the renderer it will be necessary to switch > fonts from one character to another (not the same as switching from > Helvetica to Symbol, i.e. not at FO level, but at renderer level). > > + Higher compatibility with PDF viewers which are not yet > feature-complete. > + Keeps the generated PDF/PS code more readable (not important) > - Switching between derived fonts (i.e. font with a common base font but > with special encodings) is necessary. SingleByteFont needs to be split > in two classes. > > An example: The "Baskerville Cyrillic" font contains 264 > characters/glyphs. The default encoding only contains 221 characters. So > 43 additional characters can be made available like this. > > I'm currently leaning towards CID fonts as it is probably the cleaner > approach. Both solutions are probably pretty much the same in terms of > effort. The CID approach will take more work in the PS renderer and the > multi-encoding approach will make changes necessary in FOP's font > library. > > If anyone has thoughts on this, I'd appreciate it. I'll finish the > changes for supporting the default encodings and then finish the > processing feedback stuff before I finish this here. > > Jeremias Maerki > Jeremias Maerki