Traditional and Simplified Han in UTS 39

Karl Williamson via Unicode Wed, 27 Dec 2017 13:34:44 -0800

In UTS 39, it says, that optionally,

"Mark Chinese strings as “mixed script” if they contain both simplified(S) and traditional (T) Chinese characters, using the Unihan data in theUnicode Character Database [UCD].

"The criterion can only be applied if the language of the string isknown to be Chinese."

What does it mean for the language to "be known to be Chinese"? Is thissomething algorithmically determinable, or does it come from informationabout the input text that comes from outside the UCD?

The example given shows some Hirigana in the text. That clearlyindicates the language isn't Chinese. So in this example we canalgorithmically rule out that its Chinese.


And what does Chinese really mean here?

Traditional and Simplified Han in UTS 39

Reply via email to