Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 256 by [email protected]: PLEASE do not remove cachedump, better dumping feature needed
http://code.google.com/p/memcached/issues/detail?id=256

Hi Guys.

Sorry for the long post, i wrote it to describe some very big problem with memcached (and solution).

My company implemented adserver that handles tens of millions impressions daily by extensively using memcached. We use memcached both to cache data but also for staging SQL writes. And to my knowledge it is (as of today) only available tool that can scale writes to SQL (redis because of their reclaim policy is totally ususable and other K-V storage tools are out of the equation because they write data to disk).

So we run tens of thousands of writes per minute through memcached, then we analyze the data every minute and write/update 100-200 sql rows with aggregated data. We scaled the server from about 40-50 requests / seconds to more than 800 so it works great.

But we got a problem related to LRU/"lazy" reclaim. The cache is filling all available memory and then there are some evictions becuase they keys have different expire time (some of them just 5 seconds, the others 24 hours).

As a workaround we used cachedump, to get a list of keys, then issue a GET command so the key is immediately reclaimed. And it works, the only problem is that we can't eg. dump whole 10 million keys, because the dump is limited.

To see how bad it is without this kind of "fast" reclaim - after 20-30 hours we have about 2GB of outdated keys that occupy just one SLAB. So we can't accumulate for different traffic patterns because all slabs are taken. While the non-expired set is like 30mb, so 1970mb out of 2000 is a waste. So with RAM 66 TIMES bigger than actually "needed", without cachedump we'd still got evictions.

So can you please make "improved dump" a much needed feature request? I saw posts by many other people asking about this. Maybe include command line option to turn this ON if you're concerned about security?

If that's not appropriate place to make feature requests, can you please direct me there.

Maybe it will be possible to make separated low priority thread that'll scan the key list and issue get from time to time. I'm a C++ coder, how hard will it be to make? Would it require partiall or full lock of some important shared resource so it'd be problematic (like whole item list). Maybe it'd be possible to fork the process (copy on write) so it'd have access to the whole list and then just issue GETs to parent using text protocol?

Thanks,
Slawomir.

Reply via email to