On 26-11-2011 19:28 Les Mikesell wrote:
On Sat, Nov 26, 2011 at 7:15 AM, Arjen van der Meijden<[email protected]>
wrote:
Wouldn't more servers become increasingly (seen from the application) slower
as you force your clients to connect to more servers?
Assuming all machines have enough processing power and network bandwidth,
I'd expect performance of the last of these variants to be best:
16x 1GB machines
8x 2GB machines
4x 4GB machines
2x 8GB machines
1x 16GB machines
In the first one you may end up with 16 different tcp/ip-connections per
client. Obviously, connection pooling and proxies can alleviate some of that
overhead. Still, a multi-get might actually hit all 16 servers.
That doesn't make sense. Why would you expect 16 servers acting in
parallel to be slower than a single server? And in many/most cases
the application will also be spread over multiple servers so the load
is distributed independently there as well.
Why not? Will it really be in parallel? Given that most application code
is fairly linear (i.e. all parallelism will have to come from the client
library). Even with true parallelism, you'll still have to connect to
all servers, be hindered by slow starts, etc (a connection pool may help
here). I'm just wondering whether the connection and other tcp/ip
overheads will be outweighed by any load-spreading gains. Especially
since memcached's part of the job is fairly quick.
Here's another variant on my question I hadn't even thought about:
http://highscalability.com/blog/2009/10/26/facebooks-memcached-multiget-hole-more-machines-more-capacit.html
And here's Dormando's response to that;
http://dormando.livejournal.com/521163.html
So his post also suggests it might not be a good idea to issue small
requests to many servers rather than issue large requests to few.
Best regards,
Arjen