Hi,

perhaps I totally misunderstood your problem, but why "bother" with
cassandra for storing in the first place?

If your MR for hadoop is only run once for each file (as you wrote
above), why not copy the data directly to hdfs, run your MR job and use
cassandra as sink?

As hdfs and yarn are more or less completely independent you could
perhaps use the "master" as ResourceManager (yarn) AND NameNode and
DataNode (hdfs) and launch your MR job directly and as mentioned use
Cassandra as sink for the reduced data. By this you won't need dedicated
hardware, as you only need the hdfs once, process and delete the files
afterwards.

Best wishes,

Wilm

Reply via email to