Pim Blokland <[EMAIL PROTECTED]> wrote: > No. Encoded like that it may *look* like a roman three, but two of > those are definitely not correct. Only U+2162 or its compatibility > decomposition, U+0049 U+0049 U+0049 should be used. The other two > are bad coding, just as using greek Iotas or combinations of U+2160 > and U+0049 would be.
It may happen when the text was initially encoded with a legacy encoding, then converted to Unicode. With legacy encodingsand input methods, users tend to input the characters they have on their keyboard, and will not use the complicated keystrokes needed to enter Latin letters, when the supported encoding does not have any support for Roman numerals. So you'll find Roman numerals encoded with Greek letters in many Greek texts, or with Cyrillic letters in Russian text... That's not uncommon, and in these legacy encodings, this were relly considered as a compatibility decomposition, even if this does not appear in the Unicode decompositions. In fact, most Latin, Greek and Cyrillic characters have a common origin, and inherited of the same glyph designs and many common uses from each script. Unicode did not attempt to unify them even if theorically it could have been done. But it was a compromize as these legacy encodings often include both Latin letters and Greek letters, or Latin and Cyrillic letters where they were initially not unified as well. The choice was to preserve the bijective compatibility with those widely used encodings, and maintain the difference as the characters also imply language differences, and normally different contexts that a unification in Unicode would have lost. These missing unifications are commented in the character charts, but not present in the official compatibility decompositions. However an unification is possible later, if the text contains indications of the language used, which can provide the restricted set of characters used in that language and the most widely used legacy encodings where such historic uses are common.