On Dec 20, 2007, at 10:14 AM, Kieran Benton wrote:
Are we saying that as long as you use UTF-8 for the key, and that it is
not longer that 250 bytes, then all is fine with both text and binary
protocols? If so then I think we should update the docs to say so and be
happy :)

It has nothing to do with UTF-8. There is no good reason to specify that in the documentation. It's just a bunch of bytes (or octets, if you prefer) with some specific byte values forbidden. The server does not check the bytes in the key to make sure they form valid UTF-8 sequences. You can use ASCII or UTF-8 or ISO-8859-1 or ISO-8859-5 or KOI-8 or GB-18030 or a random-number generator, so long as you avoid the forbidden bytes. It does not even have to be a human-readable key; it could be a raw hash value with certain bytes escaped. (Though obviously that makes ad-hoc debugging a bit painful.)

If we say "keys can be UTF-8" in the documentation, then some poor Russian programmer, say, who is otherwise working in KOI-8 encoding is going to add unnecessary code to a client library to transform KOI-8 to UTF-8 so as to comply with the protocol spec.

-Steve

Reply via email to