just a hypothesis, but i'm guessing that at the time they put this together, both major platforms (win32 and java) dealing with DOM used ucs-2 (and now use utf-16) internally.
even today, win32 and java mostly do not use utf-8. the only form widely supported outside of linux and unix systems is utf-8 with a fictitious "byte order mark" (obviously as a byte-oriented encoding this is useless) which is of course incompatible with tools used on unix and linux, and with many web browsers. Notepad uses this form, and Java uses a bunch of incompatible utf-8 "extensions" in its serializations (incorrect encoding of NUL and incorrect encoding of plane 1 ... plane 16 using utf-8 sequences corresponding to individual surrogate codes). unfortunately this is perpetuated in several network protocols, and e.g. is what one does when interfacing to Oracle or MySQL. even on mac os x, where it's the encoding used for the unix-type filesystem access, it's still not the default text encoding in TextEdit, and utf-8 text files "don't work" (i.e. they open as MacRoman or whatever Mac* encoding is paired with the OS language.) fortunately this si configurable, unfortunately changing it breaks all sorts of other stuff (apps frequently still ship with macroman README files, etc.) so basically, if you want it to work i recommend switching to linux, unix, plan 9, or similar :( On 3/17/07, Christopher Fynn <[EMAIL PROTECTED]> wrote:
Colin Paul Adams wrote: >>>>>> "Rich" == Rich Felker <[EMAIL PROTECTED]> writes: > > Rich> Indeed, this was what I was thinking of. Thanks for > Rich> clarifying. BTW, any idea WHY they brought the UTF-16 > Rich> nonsense to DOM/DHTML/etc.? > I don't know for certain, but I can speculate well, I think. > DOM was a micros**t invention (and how it shows!). NT was UCS-2 > (effectively). AFAIK Unicode was originally only planned to be a 16-bit encoding. the The Unicode Consortium and ISO 10646 then agreed to synchronize the two standards - though originally Unicode was only going to be a 16-bit subset of the UCS. A little after that Unicode decided to support UCS characters beyond plane 0. Anyway at the time NT was being designed (late eighties) Unicode was supposed to be limited to < 65536 characers and UTF-8 hadn't been thought of, so 16-bits probably seemed like a good idea. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
-- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
