Sean Shanny wrote:
Delip,

So far we have had pretty good luck with memcached. We are building a hadoop based solution for data warehouse ETL on XML based log files that represent click stream data on steroids.

We process about 34 million records or about 70 GB data a day. We have to process dimensional data in our warehouse and then load the surrogate <key><value> pairs in memcached so we can traverse the XML files once again to perform the substitutions. We are using the memcached solution because is scales out just like hadoop. We will have code that allows us to fall back to the DB if the memcached lookup fails but that should not happen to often.


LinkedIn have just opened up something they run internally, Project Voldemort:

http://highscalability.com/product-project-voldemort-distributed-database
http://project-voldemort.com/

It's a DHT, Java based. I haven't played with it yet, but it looks like a good part of the portfolio.

Reply via email to