I second your reservation about md5.  We just use crc32 as well.

Regarding interoperability, all of our clients in C++/PHP/Python go through
the same C memcache client so we get it by default.  We don¹t do consistent
hashing with the original pure PHP client because it¹s too inefficient.

On 3/9/08 6:21 AM, "Tomash Brechko" <[EMAIL PROTECTED]> wrote:

> On Sun, Mar 09, 2008 at 00:40:32 +0100, Henrik Schröder wrote:
>> > When placing the servers on the continuum, you have to choose how many
>> times
>> > each server appears, which hashing algorithm you use to generate the
>> places,
>> > and which scheme you use to get each successive point for a given server.
> 
> Indeed, when we say the client implements the Ketama consistent
> hashing algorithm, we mostly mean the idea of mapping servers onto
> points on the ring, but not necessarily the same implementation as in
> libketama.
> 
> While developing Cache::Memcached::Fast we considered using libketama,
> but it does too many things (I/O, shared memory, synchronization),
> plus it uses MD5 hash, which I think is an overkill: there's no
> requirement for server name hash to be irreversible, plus MD5 is 16
> bytes when only 4 bytes are needed.  It's not a question of
> performance of course, as server additions/deletions are rare, but
> more a matter of minimalistic approach.
> 
> Here's what Cache::Memcached::Fast does instead:
> 
>  - each server is mapped onto <ketama_points> * <weight> rounded to
>    nearest integer (both are parameters, server weight may be rational).
> 
>  - hash function is:
> 
>      hash = crc32(host);
>      hash = crc32_add(hash, "\0");
>      hash = crc32_add(hash, port); // port is a decimal string
>      for (i = 0; i < point_count; ++i)
>        {
>          buf[0] = i & 0xff;
>          buf[1] = (i >> 8) & 0xff;
>          buf[2] = (i >> 16) & 0xff;
>          buf[3] = (i >> 24) & 0xff;
>          point = crc32_add(hash, buf);
>          ...
>        }
> 
>    i.e. each point is a CRC32 hash of the concatenation of host name,
>    null byte, port number in decimal, and four bytes of the sequential
>    point index, least significant byte first.
> 
>  - keys are hashed with CRC32.  The key is mapped to the server with
>    the nearest point that is not less than CRC32(key) (or wraps around
>    to the server with min POINT(server), when CRC32(key) is greater
>    than any point).
> 
>  - servers are added in order.  When there's already the server with
>    the same POINT(server), the new one is added _after_ the old ones.
>    When looking for the server for a given key, and there are several
>    servers with the same POINT(server), we take the _first_ one (I
>    actually fixed this step in Cache::Memcached::Fast only today).
> 
> The implementation can be viewed at
> 
>  
> http://git.openhack.ru/?p=Cache-Memcached-Fast.git;a=blob;f=src/dispatch_key.c
> 
> functions ketama_crc32_add_server() and ketama_crc32_get_server().
> 
> 
> I'm sure not everyone will like it if I say "let's take my approach as
> the standard one".  However, if your client is already compatible with
> Cache::Memcached (which is
> 
>   server = servers[((CRC32(key) >> 16) & 0x7fff) % total_servers];
> 
> ), then you already have implemented CRC32 (and also have the notion
> of server weights).  So there's no point in introducing yet another
> hash function, and the above scheme looks like a fine approach.  If we
> are to agree on any standard, I wish it will be close to the above.
> 


Reply via email to