I am talking exactly about this pre-calculation of these 100 values per server - this is done by every Apache instance, and if you multiply it by our hitrate.. (around 18 000 000 impressions per day) you get the picture. As far as I know the php module does not use last.fm's implementation of the consistent hashing algorithm.
On 27 Фев, 18:10, Henrik Schröder <[email protected]> wrote: > That is really weird, since the only difference between naive and consistent > server selection is that for the consistent one, you pre-calculate an array > of integers that holds 100 values per server during startup, and for your > actual server selection, you do a binary search into this array with your > hashed key, but that's a really trivial operation. > > However, I seem to remember that the PHP client uses an external C library, > libketama or something, for doing the consistent server selection, this > might cause a big overhead in your case compared to doing the naive > selection which is probably implemented straight in PHP. I know that for the > Perl clients, there's one in pure Perl, and one that also uses libketama, > maybe there's something similar for PHP? > > /Henrik > > On Fri, Feb 27, 2009 at 15:50, Pavel Aleksandrov <[email protected]>wrote: > > > > > Hello, I am working for a big web site. We have around 9000 hits/s on > > our MySQL replication trees and 500 000 unique visitors each day, just > > to give a clue about the load we are experiencing. We run on MySQL, > > Apache2, Gentoo, PHP 4 + PECL Memcache module. We've been using a > > single 12G memcached instance for speeding up things (we've reached > > the point where we can't solely rely on our DB). Using a single > > instance is not what memcached is meant for, so we decided to scale > > things up a bit, so we added 12 more instances, 2G each (32 bit > > servers, 4 instances per server, 3 servers). Then we switched from the > > "standard" (naive) method of hash distribution to the "consistent" > > method. > > > What happened was that the load on our web nodes (we have 3 of them) > > went up about 3 times the usual. I'm guessing it's the new hash > > distribution method that's doing this. Am I missing something or using > > this method is always so CPU intensive? Do we have another choice or > > we should invest in more web nodes, to distribute the new load if we > > decide to stick to the consistent hashing algorithm?
