Sorry, HDFS should have been HBase.

-sujit

On Fri, 2009-10-16 at 14:36 -0700, Sujit Pal wrote:
> Hi,
> 
> I have a situation where I need to "collect" data into some sort of
> common medium from a set of mapreduce jobs, then have another mapreduce
> job "consolidate" these to provide the final result. I was considering
> using some sort of database to store the output of the first stage and
> then read them (I need to be able to do random access on the keys) in
> the second stage.
> 
> I thought of using HDFS and a colleague suggested Apache Cassandra. Both
> seem to be implementations of BigTable. I read that HDFS is a file
> handle hog, but no such thing on the Cassandra site. Would it be
> preferable, in your opinion, to use one over the other? I suppose I
> should just try them both, but if someone has done this already, would
> appreciate their input before doing this.
> 
> Thanks
> Sujit
> 

Reply via email to