Re: Storing large files for later processing through hadoop

Wilm Schumacher Fri, 02 Jan 2015 09:40:18 -0800

Hi,

perhaps I totally misunderstood your problem, but why "bother" with
cassandra for storing in the first place?


If your MR for hadoop is only run once for each file (as you wrote
above), why not copy the data directly to hdfs, run your MR job and use
cassandra as sink?

As hdfs and yarn are more or less completely independent you could
perhaps use the "master" as ResourceManager (yarn) AND NameNode and
DataNode (hdfs) and launch your MR job directly and as mentioned use
Cassandra as sink for the reduced data. By this you won't need dedicated
hardware, as you only need the hdfs once, process and delete the files
afterwards.

Best wishes,

Wilm

Re: Storing large files for later processing through hadoop

Reply via email to