Hi Bob,

I also used memcached for metadata lookup in a hadoop job. I started 15
memcached server instances on 7 nodes. I noticed
1 million hits per memcached server (in my case), yet it didn't perform up
to my expections. So I used tokyocabinet (A BDB like
file based database) and it performed well.

I have written a white paper on Hadoop performance tunning. There is a case
study in which I have described my complete
scenario, approaches and statistics. You can find the paper here

http://www.impetus.com/impetusweb/whitepapers_main.jsp?download=HadoopPerformanceTuning.pdf


On Wed, Oct 7, 2009 at 5:07 PM, Paul Ingles <[email protected]> wrote:

> Hi Bob,
>
> I don't have much in the way of usage stats to go on. However, we went on a
> similar journey with some document clustering we were doing a while ago.
>
> We wanted to do some simple key/value lookups during the process, and
> started using the existing RDBMS' that held the data already. This didn't
> really cut it so we decided to just throw Memcached onto our nodes and give
> that a go. It didn't really perform as we were expecting it to (we already
> use it for a bunch of our web apps and it works really well so it was a
> surprise). It seemed a little difficult to predict when some records would
> fall out of the cache, and so we had spurious errors making it difficult to
> depend on. We had a tight timeline so decided to just move on.
>
> In the end we installed HBase and gave it a go. Despite a few teething
> problems, it's been pretty good since then and the distribution and
> (relative) reliability meant we've stuck with it. It just seems to work for
> that kind of workload pretty well. Although I would like to go back some
> time and really figure out why memcached didn't work.
>
> Dataset wise, it was approximately 20m records, or a couple of gigabytes
> worth of data.
>
> HTH,
> Paul
>
>
> On 7 Oct 2009, at 10:58, Bob Schulze wrote:
>
>  I need a cache, that is read by many nodes often, written by a few nodes
>> rarely. Its not too big in size (200.000-2Mio records/1Gb), but may be
>> too big to fit into one node (so keeping local caches -or zookeeper- is
>> not an option).
>>
>> There is hbase in place already for other applications, do I have a
>> further benefit (faster?) using memcached (instead, not on top of
>> course) or would it only be one more piece of software to maintain?
>>
>> I read the memcache docs&wiki and are reasonable familiar with hbase but
>> would appreciate a good reason to use this or that. I am asking in the
>> hadoop list, because I think also M/R jobs need this for joins
>> occationally, and memchache is recommended often.
>>
>> Thx for any tips,
>>
>>        Bob
>>
>
>


-- 
Thanks & Regards,
Chandra Prakash Bhagtani,

Reply via email to