Sorry, HDFS should have been HBase. -sujit
On Fri, 2009-10-16 at 14:36 -0700, Sujit Pal wrote: > Hi, > > I have a situation where I need to "collect" data into some sort of > common medium from a set of mapreduce jobs, then have another mapreduce > job "consolidate" these to provide the final result. I was considering > using some sort of database to store the output of the first stage and > then read them (I need to be able to do random access on the keys) in > the second stage. > > I thought of using HDFS and a colleague suggested Apache Cassandra. Both > seem to be implementations of BigTable. I read that HDFS is a file > handle hog, but no such thing on the Cassandra site. Would it be > preferable, in your opinion, to use one over the other? I suppose I > should just try them both, but if someone has done this already, would > appreciate their input before doing this. > > Thanks > Sujit >