Sean Shanny wrote:
Delip,
So far we have had pretty good luck with memcached. We are building a
hadoop based solution for data warehouse ETL on XML based log files that
represent click stream data on steroids.
We process about 34 million records or about 70 GB data a day. We have
to process dimensional data in our warehouse and then load the surrogate
<key><value> pairs in memcached so we can traverse the XML files once
again to perform the substitutions. We are using the memcached solution
because is scales out just like hadoop. We will have code that allows
us to fall back to the DB if the memcached lookup fails but that should
not happen to often.
LinkedIn have just opened up something they run internally, Project
Voldemort:
http://highscalability.com/product-project-voldemort-distributed-database
http://project-voldemort.com/
It's a DHT, Java based. I haven't played with it yet, but it looks like
a good part of the portfolio.