- the current naming convention does not support the mapping to (sequences of characters containing) supplemental characters (i.e. outside the BMP).
- the current document contains all sort of information beyond the original goal, which was simply to provide a function from glyph names to character strings. I want to come back and stick to the original goal.
- the PUA usage implied by the naming convention is a major disaster for two reasons: first, this was mostly an attempt to encode glyphs (by opposition to characters), and second, this PUA usage needs to be understood by far too many parties for that to happen. I want to deprecate it.
This document presents a algorithm to convert from glyph names to sequences of Unicode characters. [more or less the current sentences that explain how this is useful for e.g. ATM, a Type 1 to OpenType converter.]The revised AGL table will be a combination of the current AGL table and the PUA usage we give. As a sample, it will look like:
The algorithm is as follow:
- drop everything after the first "."
- split on "_"
- convert each fragment produced by 2 to a string like this:
- if the fragment is in the left column of the AGL table, use the string in the right column
- else, if the fragment is of the form "uni[0-9AF]*" and the number of hex digits is a multiple of four, interpret each group of four digits as a Unicode scalar value
- else, if the fragment is of the form "u[0-9A-F]*" and the number of hex digits is 4, 5 or 6, interpret those digits as a Unicode scalar value
- else produce the empty string
- concatenate the strings obtained by converting each fragment
| name |
sequence |
| A |
U+0041 |
| Asmall |
U+0061
|
| ... |
... |
The first entry is because the current AGL says so. The second entry is because the current AGL says "Asmall -> U+F761" and the PUA usage says that U+F761 decomposes to "<sc> 0061".
What I am still trying to figure out is the extent of the list (i.e. what names will be in it). The first observation is that the current AGL names should all be there. The second observation is that given the uni<code> and u<code> mechanisms, and given the huge cost and delay in having the list propagated to all the implementations of the algorithm (as Berthold noted), the best is to freeze the list forever after the next publication. In the end, I think the best is to add to the current AGL list a set of widely accepted names, e.g. those that Apple use in the fonts they ship, and may be those found in common math fonts; the tricky part is to determine when it's better to reissue a font with more useful names, and when it's better to augment the list. There is one more question about the new AGL list I have not yet figured out: should it include "uniF761" (and map it to U+0061) or should we let that name be handled by step 3.2 of the algorithm (and map it to U+F761).
Independent of the "Unicode and glyph names" document is the set of mappings published at http://www.unicode.org/Public/MAPPINGS/VENDORS/ADOBE. The first thing to realize is that Adobe is not the source of those mappings. I believe they have originally been created by Next, for use in Display PostScript applications. I may deal with those after I am done with the "Unicode and glyph names" document.
Eric.
