Hi,

I believe the consistent hashing implementation in pecl/memcache is simply a C port of the original one from Set-ConsistentHash, available from

 http://search.cpan.org/~bradfitz/Set-ConsistentHash/

What algorithms are others using? From what I can pick up from the libmemcached sources it works much the same but without the precomputed 1024 element array. The 100 (or 160 in the case of ketama) points per server is still computed and sorted, so consistent hashing will always be slower than the simple modulo operation of the standard algorithm.

It does however make sense to skip the precomputed array and instead search the continuum for a matching server on each request. One would have to make more than 1024 requests for the array to pay off, something the average webpage rendering is unlikely to reach.

//Mikael

Xaxo wrote:


On Feb 27, 6:48 pm, Mikael Johansson <[email protected]> wrote:
Hi,

The 2.2.x branch had the same performance improvement but it wasn't
released until a few minutes ago :) Try the 2.2.5 release, should give
you better performance when using the consistent hashing strategy.

The performance improvement people are talking about is:

http://cvs.php.net/viewvc.cgi/pecl/memcache/memcache_consistent_hash.c?r1=1.6&r2=1.7

it might bring you some performance improvement in the algorithm
selecting a server for a particular key. Nevertheless, the
implementation is slow and clumsy: it creates <weight> * 160 hashes
per server and then takes 1024 of them. One of these 1024 is chosen
upon a simple division <your hash> % 1024, which points back to a
server from your pool.

In the case of the standard algorithm, you create an array in which
every server contributes <weight> elements. You decide upon a server
again by a simple division <your key> % <array size>.

As you can see, the consistent algorithm has a great performance
impact due to:
 -  calculating <weight> * 160 hashes for each server
  - choosing 1024 out of all hashes

another negative impact, that you didn't saw is when you have a small
number of servers (3 - 4) and small weights (say 1), then the load on
the servers is unequally distributed, due to the fact that you have to
fill these 1024 places with say 640 elements. As the filling algorithm
works, some elements will take several places in the 1024 array,
therefore upon server selection via <key> % 1024, some servers will
have higher probability to be selected.

Having said this, if you need performance improvements:
  - try the patch mentioned above
  - try some other php client
  - look how others do server selection and implement it in some
present php client
  - design your own server selection algorithm and implement it
  - stick with the standard straight forward algorithm

 * The constants mentioned above are to be found in
http://cvs.php.net/viewvc.cgi/pecl/memcache/php_memcache.h?revision=1.39&view=markup
namely:
90      #define MMC_CONSISTENT_POINTS 160                       /* points per 
server */
91      #define MMC_CONSISTENT_BUCKETS 1024                     /* number of 
precomputed
buckets, should be power of 2 */

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to