It must be a full moon on Halloween, because here I am in the extremely unfamiliar position of disagreeing quite strongly with Ken Whistler.
In a message dated 2001-10-31 17:16:25 Pacific Standard Time, [EMAIL PROTECTED] writes: > As current Czar of Names Rectification, I must start protesting > here. SCSU is a means of *compressing* Unicode text. It is > not "[an]other method of encoding Unicode characters." I was about to reply, "Of course it is," before I realized that Ken was interpreting the word "encoding" in the strictest sense, invoking the distinction between character encoding forms (CEFs) and transfer encoding syntaxes (TESs). In some cases this is a worthwhile distinction, but I don't think it is relevant in the case of David's query, or, for that matter, in many other cases where users may think of Unicode text being "represented" as UTF-32, UTF-16, UTF-8, SCSU, ASCII with UCN sequences, or even (God forbid) CESU-8. SCSU is indeed another method of "representing" Unicode characters, if not necessarily "encoding" them in the strict sense of the word. > And before going on, I'm not clear exactly what you are > trying to do. SCSU is defined on UTF-16 text. It would, of > course, be possible to create SCSU-like windowing compression > schemes that would work on UTF-32 or UTF-8 text, but those are > not part of UTS #6 as it is currently written. Like David, I don't see how SCSU is defined on, or limited to, UTF-16 text, except in the sense that literal or quoted "Unicode-mode" SCSU text is UTF-16. SCSU is defined on Unicode scalar values, which are not tied to a particular CEF. You can define an window in what SCSU calls "the expansion space" using the SDX or UDX tag and, in the best case, store N characters of Gothic or Deseret text in N + 3 bytes. None of this has anything to do with surrogates or 16-bitness. In a message dated 2001-10-31 17:59:33 Pacific Standard Time, [EMAIL PROTECTED] writes: > I have no quarrel with the claim that the SCSU scheme could be > implemented directly on UTF-32 data. But as Unicode Technical Standard > #6 is currently written, that is not how to do it conformantly. I have looked throughout UTS #6 and cannot find anything, explicit or implicit, to the effect that SCSU could not be conformantly implemented against UTF-32 data. Sections 6.1.3 and 8.1 refer to how "surrogate pairs" may be encoded (*) in SCSU, but if you substitute the phrase "non-BMP characters" the meaning is identical. (*) The word "encoded" was taken directly from UTS #6, section 8.1. > At the moment, if you want to compare SCSU-compressed text > against the UTF-32 form, you would have to convert the UTF-32 > text to UTF-16, and then compress it using SCSU. You don't > apply SCSU directly to UTF-32 data. Why not? The fact that UTS #6 was originally written before UTF-32 was formally defined has nothing to do with this. The same could be said for UTF-8, which (like SCSU) has a surrogate-free mechanism for representing non-BMP characters. > It seems to me that a rewrite of SCSU would be in order to explicitly > allow and define UTF-32 implementations as well as UTF-16 implementations > of SCSU. I don't see anything that needs rewriting. What are you seeing? -Doug Ewell Fullerton, California

