At 13:02 01/10/23 -0700, liana Ye wrote: > From screen display point view, TC/SC are different glyph > sets(who defines the sets? How is it used by 1/5 of the world >population? Is Uicode group the only authoritive one? In >China there are over 600 recorded views on this).
The Han ideographs in Unicode/ISO 10646 are defined by the IRG (Ideographic Raporteur Group). This group reports to ISO/IEC SC2 WG2, the ISO WG responsible for ISO 10646. It is composed of representatives from all the countries or similar entities interested in Han ideographs. That includes China, Japan, Korea (both South and North), Taiwan, Hong Kong, Singapore, and the US (I hope I didn't forget anybody, and please excuse the maybe politically uncorrect shortcuts). The US is the only country represented without a tradition of using Han ideographs, but usually only sends a small delegation and mainly helps with wording. Many other countries may send rather large delegations (given the number of characters, which means a lot of work, this is no surprise). The Unicode consortium participates in the IRG with an observer status only. The IRG has published guidelines for deciding when to unify two occurences and when not. Because of the very huge number of characters, there is in some cases indeed a thin line as to whether something should be unified or not. And in these cases, the IRG just has to make a decision. Overall, the guidelines are somewhat difficult to understand at first, but they are designed mainly with a 'least surprise to the average user' in mind, and I think they have achieved this goal very well. The guidelines are based on earlier ones used for the Japanese standard. The core of the guidelines says that if two characters look significantly different, then they are encoded with two codes even if they e.g. are one-to-one SC/TC equivalents. This is to avoid suddenly changing the appearance of letters for a user who may not be familiar with the significantly differently looking shape. On the other hand, cases where there is only a small difference in shape are unified (i.e. only one code) unless this small difference in shape makes an actual difference in meaning. Overall, the results are so that if you present a text where you change the glyph shapes within the range that is unified, people who have done basic education but don't know about different shapes (e.g. people in Taiwan or Hong Kong who only know about TC, or people in China or Singapore who only know about SC, or people in Japan who only know about the forms used in Japan) will read over these changes without problems, and might at some points say 'this looks a bit strange', but will still identify the character. There are some exceptions to these rule related to backwards- compatible roundtripping (source separation rule). Hope this helps, Martin.
