Ec2 and MR Job question

Billy Pearson Sat, 14 Jun 2008 13:32:31 -0700

I have a question someone may have answered here before but I can not findthe answer.


Assuming I have a cluster of servers hosting a large amount of data

I want to run a large job that the maps take a lot of cpu power to run andthe reduces only take a small amount cpu to run.I want to run the maps on a group of EC2 servers and run the reduces on thelocal cluster of 10 machines.

The problem I am seeing is the map outputs, if I run the maps on EC2 theyare stored local on the instanceWhat I am looking to do is have the map output files stored in hdfs so I cankill the EC2 instances sense I do not need them for the reduces.

The only way I can thank to do this is run two jobs one maper and store theoutput on hdfs and then run a second job to run the reduces

from the map outputs store on the hfds.

Is there away to make the mappers store the final output in hdfs?

Ec2 and MR Job question

Reply via email to