@dormando, great response.... this is almost exctly what i had in mind, i.e. grouping all of your memcached servers into logical pools so as to avoid hitting all of them for every request. infact, a reasonable design, for a very large server installation base, would be to aim for say 10-20% node hit for every request (or even less if u can manage it).
so with the facebook example, we know there's a point you get to where a high node count means all sorts of problems, in this case, it was 800, i think (correct me if am wrong) proving the point that logical groupings should be the way to go for large pools -- infact i would suggest groupings with varying canopies of granularity as long as your app kept a simple and clear means by which to zap down to a small cross section of servers without losing any intended benefits of caching in the fisrt place.. in short, most of my anxities have been well addressed (both theoretical and practical)... +1 for posting this in a wiki Dormando. thanks @dormando @henrik @les (oh, and @arjen) -m. On 26 November 2011 22:34, dormando <[email protected]> wrote:a > > @Les, you make a clear and concise point. thnx. > > > > In this thread, i'm really keen on exploring a theoretical possibility > (that could become very practical for very large installations): > > > > -- at what node count (for a given pool) may/could we start to > experience problems related to performance (server, network or even client) > > assuming a near perfect hardware/network set-up? > the benefoit > I think the really basic theoretical response is: > > - If your request will easily fit in the TCP send buffer and immediately > transfer out the network card, it's best if it hits a single server. > - If your requests are large, you can get lower latency responses by not > waiting on the TCP socket. > - Then there's some fiddling in the middle. > - Each time a client runs "send" that's a syscall, so more do suck, but > keep in mind the above tradeoff: A little system cpu time vs waiting for > TCP Ack's. > > In reality it doesn't tend to matter that much. The point of my response > to the facebook "multiget hole" is that you can tell clients to group keys > to specific or subsets of servers, (like all keys related to a particular > user), so you can have a massive pool and still generally avoid contacting > all of them on every request. > > > -- if a memcacached client were to pool say, 2,000 or 20,000 > connections (again, theoretical but not entirely impractical given the rate > of > > internet growth), wud that not inject enough overhead -- connection or > otherwise -- on the client side to, say, warrant a direct fetch from the > > database? in such a case, we wud have established a *theoretical* > maximum number nodes in a pool for that given client in near perfect > conditions. > > The theory depends on your setup, of course: > > - Accessing the server hash takes no time (it's a hash), calculating it > is the time consuming one. We've seen clients misbehave and seriously slow > things down by recalculating a consistent hash on every request. So long > as you're internally caching the continuum the lookups are free. > > - Established TCP sockets mostly just waste RAM, but don't generally slow > things down. So for a client server, you can calculate the # of memcached > instances * number of apache procs or whatever * the amount of memory > overhead per TCP socket compared to the amount of RAM in the box and > there's your limit. If you're using persistent connections. > > - If you decide to not use persistent connections, and design your > application so satisfying a page read would hit at *most* something like 3 > memcached instances, you can go much higher. Tune the servers for > TIME_WAIT reuse, higher local ports, etc, which deals with the TCP churn. > Connections are established on first use, then reused until the end of the > request, so the TCP SYN/ACK cycle for 1-3 (or even more) instances won't > add up to much. Pretending you can have an infinite number of servers on > the same L2 segment you would likely be limited purely by bandwidth, or > the amount of memory required to load the consistent hash for clients. > Probably tens of thousands. > > - Or use UDP, if your data is tiny and you tune the fuck out of it. > Typically it doesn't seem to be much faster, but I may get a boost out of > it with some new linux syscalls. > > - Or (Matt/Dustin correct me if I'm wrong) you use a client design like > spymemcached. The memcached binary protocol can actually allow many client > instances to use the same server connections. Each client stacks commands > in the TCP sockets like a queue (you could even theoretically add a few > more connections if you block too long waiting for space), then they get > responses routed to them off the same socket. This means you can use > persistent connections, and generally have one socket per server instance > for an entire app server. Many thousands should scale okay. > > - Remember Moore's law does grow computers very quickly. Maybe not as fast > as the internet, but ten years ago you would have 100 megabit 2G ram > memcached instances and need an awful lot of them. Right now 10ge is > dropping in price, 100G+ RAM servers are more affordable, and the industry > is already looking toward higher network rates. So as your company grows, > you get opportunities to cut the instance count every few years. > > > -- also, i wud think the hashing algo wud deteriorate after a given > number of nodes.. admittedly, this number could be very large indeed and > > also, i know this is unlikely in probably 99.999% of cases but it wud > be great to factor in the maths behind science. > > I sorta answered this above. Should put this into a wiki page I guess... > > -Dormando
