> Donald Eastlake 3rd <[EMAIL PROTECTED]> wrote on the IETF list: > > > There is now a standard way to encode URIs containing arbitrary > > UNICODE characters. This is described in RFC 3275 (which is > > currently a Draft Standard), in Section 4.3.3.1, and in the > > corresponding W3C document and has appeared in other W3C documents, > > for exampe XML Base. > > So U+00E1 LATIN SMALL LETTER A WITH ACUTE (�), which is 0xC3 0xA1 in > UTF-8, is encoded as > "%C3%A1" (six bytes) according to RFC 3275. All BMP characters above > U+07FF, including all CJK characters, take three UTF-8 bytes and thus > nine RFC 3275 bytes. > > I thought CJK users and others wanted *better* compression. > > (No, David, I know you're not all the same person. I heard lots of > voices saying the same thing.)
% is not in the previous allow characters for domain names anyways, so why making CJK into 9bytes using the %-escaped and not just the 3 bytes UTF-8. (This is my own thought, and not for all CJK users : >, I wish my voice can represent all CJK users and be agreed by them).
