Hi Bob, I also used memcached for metadata lookup in a hadoop job. I started 15 memcached server instances on 7 nodes. I noticed 1 million hits per memcached server (in my case), yet it didn't perform up to my expections. So I used tokyocabinet (A BDB like file based database) and it performed well.
I have written a white paper on Hadoop performance tunning. There is a case study in which I have described my complete scenario, approaches and statistics. You can find the paper here http://www.impetus.com/impetusweb/whitepapers_main.jsp?download=HadoopPerformanceTuning.pdf On Wed, Oct 7, 2009 at 5:07 PM, Paul Ingles <[email protected]> wrote: > Hi Bob, > > I don't have much in the way of usage stats to go on. However, we went on a > similar journey with some document clustering we were doing a while ago. > > We wanted to do some simple key/value lookups during the process, and > started using the existing RDBMS' that held the data already. This didn't > really cut it so we decided to just throw Memcached onto our nodes and give > that a go. It didn't really perform as we were expecting it to (we already > use it for a bunch of our web apps and it works really well so it was a > surprise). It seemed a little difficult to predict when some records would > fall out of the cache, and so we had spurious errors making it difficult to > depend on. We had a tight timeline so decided to just move on. > > In the end we installed HBase and gave it a go. Despite a few teething > problems, it's been pretty good since then and the distribution and > (relative) reliability meant we've stuck with it. It just seems to work for > that kind of workload pretty well. Although I would like to go back some > time and really figure out why memcached didn't work. > > Dataset wise, it was approximately 20m records, or a couple of gigabytes > worth of data. > > HTH, > Paul > > > On 7 Oct 2009, at 10:58, Bob Schulze wrote: > > I need a cache, that is read by many nodes often, written by a few nodes >> rarely. Its not too big in size (200.000-2Mio records/1Gb), but may be >> too big to fit into one node (so keeping local caches -or zookeeper- is >> not an option). >> >> There is hbase in place already for other applications, do I have a >> further benefit (faster?) using memcached (instead, not on top of >> course) or would it only be one more piece of software to maintain? >> >> I read the memcache docs&wiki and are reasonable familiar with hbase but >> would appreciate a good reason to use this or that. I am asking in the >> hadoop list, because I think also M/R jobs need this for joins >> occationally, and memchache is recommended often. >> >> Thx for any tips, >> >> Bob >> > > -- Thanks & Regards, Chandra Prakash Bhagtani,
