I enclose an excerpt from http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT.
It says that some TC char X cannot be converted into UCS chars without losing roundtrip compatibility. ie, In a roundtrip conversion X --> UCS(X) --> X', It may arise that X != X'. Any legacy-encoded IRI(IDN)s including X, may fail to be compared successfully if they had undergone conversions into/from unicode. I will appreaciate if anyone present the history of BIG5 versions and its round-trip compatilibity problems in more detail. Soobok Lee -------------------------------------------------------------------------------- # Name: BIG5 to Unicode table (complete) # Unicode version: 1.1 # Table version: 0.0d3 # Table format: Format A # Date: 11 February 1994 # # Copyright (c) 1991-1994 Unicode, Inc. All Rights reserved. (snip) # If you have carefully considered the fact that the mappings in # this table are only one possible set of mappings between BIG5 and # Unicode and have no normative status, but still feel that you # have located an error in the table that requires fixing, you may # report any such error to [EMAIL PROTECTED] # # WARNING! It is currently impossible to provide round-trip compatibility # between BIG5 and Unicode. # # A number of characters are not currently mapped because # of conflicts with other mappings. They are as follows: # # BIG5 Description Comments # # 0xA15A SPACING UNDERSCORE duplicates A1C4 # 0xA1C3 SPACING HEAVY OVERSCORE not in Unicode # 0xA1C5 SPACING HEAVY UNDERSCORE not in Unicode # 0xA1FE LT DIAG UP RIGHT TO LOW LEFT duplicates A2AC # 0xA240 LT DIAG UP LEFT TO LOW RIGHT duplicates A2AD # 0xA2CC HANGZHOU NUMERAL TEN conflicts with A451 mapping # 0xA2CE HANGZHOU NUMERAL THIRTY conflicts with A4CA mapping # # We currently map all of these characters to U+FFFD REPLACEMENT CHARACTER. # It is also possible to map these characters to their duplicates, or to # the user zone. # # Notes: # # 1. In addition to the above, there is some uncertainty about the # mappings in the range C6A1 - C8FE, and F9DD - F9FE. The ETEN # version of BIG5 organizes the former range differently, and adds # additional characters in the latter range. The correct mappings # these ranges need to be determined. # # 2. There is an uncertainty in the mapping of the Big Five character # 0xA3BC. This character occurs within the Big Five block of tone marks # for bopomofo and is intended to be the tone mark for the first tone in # Mandarin Chinese. We have selected the mapping U+02C9 MODIFIER LETTER # MACRON (Mandarin Chinese first tone) to reflect this semantic. # However, because bopomofo uses the absense of a tone mark to indicate # the first Mandarin tone, most implementations of Big Five represent # this character with a blank space, and so a mapping such as U+2003 EM # SPACE might be preferred. # # Format: Three tab-separated columns # Column #1 is the BIG5 code (in hex as 0xXXXX) # Column #2 is the Unicode (in hex as 0xXXXX) # Column #3 is the Unicode name (follows a comment sign, '#') # The official names for Unicode characters U+4E00 # to U+9FA5, inclusive, is "CJK UNIFIED IDEOGRAPH-XXXX", # where XXXX is the code point. Including all these # names in this file increases its size substantially # and needlessly. The token "<CJK>" is used for the # name of these characters. If necessary, it can be # expanded algorithmically by a parser or editor. # # The entries are in BIG5 order # #
