Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote: > As I have not determined the correct size of these bitfields, I need > some intermediate solution to pack them a little, and the UTF-8 TES > (not the UTF-8 CES used by Unicode)venient for now, until I change it > to a better encoding, which may or may not leak out (I am not sure > that I need to make the encoding accessible from an interface, except > for debugging).
I hope I understand the "venient" passage correctly. I'm pretty sure you mean "... the UTF-8 CES (not the UTF-8 CEF used by Unicode)..." A CEF maps code points to code units, and you don't mean that because you're not mapping Unicode code points. A CES, on the other hand, maps code units to bytes, and that *is* what you are doing with the code units in your internal mechanism: mapping them to bytes using the original 31-bit definition of UTF-8. A TES is a very specific thing. Apparently this term is reserved for mappings that explicitly solve a particular problem, such as MIME compatibility or compression. So quoted-printable is a good example of a TES, because it makes an arbitrary text stream -- already encoded in UTF-8, Windows code page 1252, or whatever -- transferable through mechanisms that support RFC 822, avoiding all of the bytes that mean something special. Likewise, Base64 is applied directly to an arbitrary byte stream, which means the data was already encoded in a CES before applying the additional Base64 layer. I've always had trouble with the assertion that SCSU (for example) is a TES rather than a CES. Certainly it solves a particular problem (compression) and avoids, to an extent, gratuitous use of bytes like 0D and 0A. However, it is applied to a sequence of *Unicode code points*, not code units, and certainly not bytes the way QP is. You don't take the UTF-8-encoded stream <C2 BF 51 75 C3 A9 3F> and encode *those seven bytes* in SCSU; rather, you encode the stream of five Unicode code points <00BF 0051 0075 00E9 003F>. That said, the definitions in UTR #17 were surprisingly difficult for me to wrap my brain around in general, so I might be off-base on some of this. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

