hi everyone, i have a few hadoop streaming tasks that would benefit from having the reduce phase execute one reducer per node instead of two per node due to high cpu and i/o. currently, i have a 30 node cluster and specify 30 reducers. when reviewing the job stats on the job tracker, i do see 30 reducers queued/executing; however, i have observed that those reducers are distributed to 15 nodes resulting in only 50% use of my cluster.
after reviewing the hadoop docs, i've tried setting the following properties when starting my streaming job; however, they don't seem to have any impact. -jobconf mapred.tasktracker.reduce.tasks.maximum=1 how do i tell hadoop to run 1 reducer per node with streaming? thanks in advance for your assistance! regards, -steven
