Unicode has changed and evolved over the years. At this point, UCS-2 is a funny
beast, because it shares precisely the same encoding space as UTF-16. That is,
in code units there is absolutely no difference between them. The only real
difference is whether you interpret the code units in the range D800..DFFF.
(Interpret them correctly, of course!)

As a serialization, UTF-16 has three forms: UTF-16, UTF-16BE, and UTF-16LE. The
first is with (optionally) a BOM, and the others without. Since UCS-2 shares the
same coding space, and thus serialization, it's not really a good idea to speak
of UCS-2LE etc.; much better to just use the UTF-16 names.

The best way I find to think of UCS-2 at this point is *not*
(𝑛𝑜𝑡)  another encoding, but rather simply a shorthand
for a particular supported subset of UTF-16. In that way, it is like other
subsets: for example, I can talk about the Cyrillic-block repertoire in UTF-16.

Mark

Reply via email to