We have lots of servers but have limited storage pool. My map jobs are handle lots of small input files (approx 300Mb Compressed) but the reduce input is huge ( about 100Gb) requiring lots of temporary and local storage. I would like to divide my server pool into two kinds - one set with a small disks ( for map jobs) and a few with big storage ( for the combine and reduce jobs).
Is there something I can do that lets me force the reduce job to run on a specific nodes? I have done google searching and searching through some forums but not found. -best regards, Raj -- View this message in context: http://old.nabble.com/Seperate-Server-Sets-for-Map-and-Reduce-tp29216327p29216327.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
