On Dec 20, 2007, at 10:14 AM, Kieran Benton wrote:
Are we saying that as long as you use UTF-8 for the key, and that it
is
not longer that 250 bytes, then all is fine with both text and binary
protocols? If so then I think we should update the docs to say so
and be
happy :)
It has nothing to do with UTF-8. There is no good reason to specify
that in the documentation. It's just a bunch of bytes (or octets, if
you prefer) with some specific byte values forbidden. The server does
not check the bytes in the key to make sure they form valid UTF-8
sequences. You can use ASCII or UTF-8 or ISO-8859-1 or ISO-8859-5 or
KOI-8 or GB-18030 or a random-number generator, so long as you avoid
the forbidden bytes. It does not even have to be a human-readable key;
it could be a raw hash value with certain bytes escaped. (Though
obviously that makes ad-hoc debugging a bit painful.)
If we say "keys can be UTF-8" in the documentation, then some poor
Russian programmer, say, who is otherwise working in KOI-8 encoding is
going to add unnecessary code to a client library to transform KOI-8
to UTF-8 so as to comply with the protocol spec.
-Steve