José Mejuto schrieb:
I think that the text that says the UCS2 has been extended, does not
means that UCS2 has been extended, it says that UCS2 has been extended
to UTF-16, so UCS2 can not be considered Unicode anymore as noted in
ISO 10646:
UCS-2. UCS-2 stands for �Universal Character Set coded in 2 octets� and is also
known as
�the two-octet BMP form.� It was documented in earlier editions of 10646 as the
two-octet
(16-bit) encoding consisting only of code positions for plane zero, the Basic
Multilingual
Plane. This documentation has been removed from ISO/IEC 10646:2011, and the term
UCS-2 should now be considered obsolete. It no longer refers to an encoding
form in either
10646 or the Unicode Standard.
I agree that UCS-2 no longer represents the current Unicode range, but
it still is a true UCS-4 subset (BMP).
The UCS standards define Unicode as ranges of values, while the UTF
standards define encodings.
The UTF-7/8 encodings are purely numerical compression schemes, while
UTF-16 (with surrogate pairs) more reflects a tree-like structure of
"planes", "groups", "blocks", "codepages" etc., favored by the Unicode
Consortium. Such a view may be interesting to font writers, which can
restrict an font to part of the full Unicode range, but is of little
help with handling Unicode programmatically.
DoDi
--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus