That seems fine to me, but we don't actually need to forbid 0x7F.
Memcached doesn't do anything special with that byte.
-Steve
On Dec 20, 2007, at 11:34 AM, Aaron Stone wrote:
This is pretty verbose, but hopefully will cut way down on this FAQ:
Keys are limited in length to 250 octets. Octets in the key MUST NOT
have value 0x20 or less, nor value 0x7F (corresponding to ASCII space
and all control characters below it, and ASCII del, respectively).
Octets MAY have their 'high bits' set.
Note: The UTF-8 character encoding produces output octets which meet
these requirements. Please be aware that some characters may be
represented as more than one octet. Refer to your language's string
length functions to ensure that you are producing keys of 250 or
fewer
_octets_ and not simply 250 or fewer _characters_.
I forgot about that ascii 127 deal until I re-read 'man ascii' just
now.
I assume we need to restrict that, too, so I put it in the text above.
Do you think this text still inadvertently suggests that we require
UTF-8?
Aaron
On Thu, Dec 20, 2007, Kieran Benton <[EMAIL PROTECTED]>
said:
Point taken - that was something I hadn't considered.
I still think it's a good idea to add a footnote into that section of
the docs to note that UTF8 is a "safe" encoding to use since it is so
popular in western systems and many devs might not necessarily know
if
it fulfills the criteria (I certainly didn't from a brief scan).
This is of course if its decided by the end of this thread that it
can
be used generically! :)
Cheers,
Kieran
-----Original Message-----
From: Steven Grimm [mailto:[EMAIL PROTECTED]
Sent: 20 December 2007 18:32
To: Kieran Benton
Cc: Dustin Sallings; a.; [email protected]
Subject: Re: What is a valid key?
On Dec 20, 2007, at 10:14 AM, Kieran Benton wrote:
Are we saying that as long as you use UTF-8 for the key, and that it
is
not longer that 250 bytes, then all is fine with both text and
binary
protocols? If so then I think we should update the docs to say so
and be
happy :)
It has nothing to do with UTF-8. There is no good reason to specify
that in the documentation. It's just a bunch of bytes (or octets, if
you prefer) with some specific byte values forbidden. The server does
not check the bytes in the key to make sure they form valid UTF-8
sequences. You can use ASCII or UTF-8 or ISO-8859-1 or ISO-8859-5 or
KOI-8 or GB-18030 or a random-number generator, so long as you avoid
the forbidden bytes. It does not even have to be a human-readable
key;
it could be a raw hash value with certain bytes escaped. (Though
obviously that makes ad-hoc debugging a bit painful.)
If we say "keys can be UTF-8" in the documentation, then some poor
Russian programmer, say, who is otherwise working in KOI-8 encoding
is
going to add unnecessary code to a client library to transform KOI-8
to UTF-8 so as to comply with the protocol spec.
-Steve
--