In my opinion:

A key can be up to 250 bytes. It may not contain:
  null (0x00)
  space (0x20)
  tab (0x09)
  newline (0x0a)
  carriage-return (0x0d)

Beyond that, memcached shouldn't care. If your keys are UTF-8, fine. If not, fine -- just so long as they don't exceed 250 bytes, memcached will just treat them as binary blobs.

UTF-8, for those who don't know, cannot introduce any of the above forbidden characters as part of its multibyte sequences. The bytes in a UTF-8 sequence are always in the 0x80-0xFF range (actually more restricted than that.)

UTF-16 or UTF-32 would likely cause problems, but that's fine -- the rules above, being based on raw bytes, will pretty much imply that.

-Steve


On Dec 19, 2007, at 10:30 AM, Dustin Sallings wrote:


I just got a bug report for my client regarding multibyte characters within a key. In order to fix it, I need to know what *should* be allowed in a key.

The protocol document is fairly vague as far as what makes up a key. It says some specific characters that *aren't* valid, but seems to have been written with an ASCII mindset.

In the binary protocol, we have a lot of freedom, but that freedom doesn't extend to the text protocol.

Should we constrain keys to ASCII, or force clients to understand UTF-8 (or some other specific encoding)?

--
Dustin Sallings


Reply via email to