Apologies if this is a repeat of a (much) earlier inquiry.
 
The mapping tables that are available as part of the Unicode Standard
(http://www.unicode.org/Public/MAPPINGS/) are generally provided in a
text format called "Format A." Each line in the file defines a mapping
between a character in a legacy encoding and the Unicode equivalent,
with fields separated by tabs or sequences of spaces, like this:
 
0xA0    0x00A0  #NO-BREAK SPACE
0xA1    0x00A1  #INVERTED EXCLAMATION MARK
0xA2    0x00A2  #CENT SIGN
 
The format supports DBCS as well:
 
0x8140  0x4E02  #CJK UNIFIED IDEOGRAPH
0x8141  0x4E04  #CJK UNIFIED IDEOGRAPH
0x8142  0x4E05  #CJK UNIFIED IDEOGRAPH
 
My questions are:
 
1. Is there a specification for this format anywhere, and if so, where?
 
2. Is there a "Format B" or similar? (I don't mean UCM, CharMapML, RFC
1345 format, etc., but something truly similar to and/or derivative of
Format A.)
 
Please reply on-list only if you think the list at large would benefit
from your reply. I'm hoping some of the Unicode elders might have some
insight here.
 
--
Doug Ewell | Thornton, CO, US | ewellic.org
 

Reply via email to