Hi,

I have a situation where I need to "collect" data into some sort of
common medium from a set of mapreduce jobs, then have another mapreduce
job "consolidate" these to provide the final result. I was considering
using some sort of database to store the output of the first stage and
then read them (I need to be able to do random access on the keys) in
the second stage.

I thought of using HDFS and a colleague suggested Apache Cassandra. Both
seem to be implementations of BigTable. I read that HDFS is a file
handle hog, but no such thing on the Cassandra site. Would it be
preferable, in your opinion, to use one over the other? I suppose I
should just try them both, but if someone has done this already, would
appreciate their input before doing this.

Thanks
Sujit

Reply via email to