Looks like that is the reducer who is actually doing the work with 14M input records.
Reduce input groups 1 Combine output records 0 Reduce shuffle bytes 5,135,004,496 Reduce output records 14,232,592 Spilled Records 14,232,592 Combine input records 0 Reduce input records 14,232,592 Other reducers have this: Reduce output records0 Spilled Records0 Combine input records0 Reduce input records0 -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. ________________________________ From: Raimon Bosch <[email protected]> To: [email protected] Sent: Saturday, October 22, 2011 6:01 PM Subject: why one of the reducers it's always slower? Hi all, I'm executing one job to convert logs into hive tables. The times are very good once we have added a proper number of nodes but the reduce phase spends always more time in one of the machines. task_201110211442_0086_r_000000<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000000> 100.00% reduce > reduce 23-Oct-2011 00:26:42 23-Oct-2011 00:28:09 (1mins, 27sec) 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000000> task_201110211442_0086_r_000001<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000001> 100.00% reduce > reduce 23-Oct-2011 00:26:42 23-Oct-2011 00:28:10 (1mins, 27sec) 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000001> task_201110211442_0086_r_000002<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000002> 100.00% reduce > reduce 23-Oct-2011 00:26:43 23-Oct-2011 00:28:10 (1mins, 27sec) 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000002> task_201110211442_0086_r_000003<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000003> 100.00% reduce > reduce 23-Oct-2011 00:26:43 23-Oct-2011 00:28:10 (1mins, 27sec) 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000003> task_201110211442_0086_r_000004<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000004> 100.00% reduce > reduce 23-Oct-2011 00:26:44 23-Oct-2011 00:35:56 (9mins, 11sec) 10<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000004> task_201110211442_0086_r_000005<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000005> 100.00% reduce > reduce 23-Oct-2011 00:26:44 23-Oct-2011 00:28:09 (1mins, 24sec) 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000005> As you can see in the statistics from 6 reduce executions one is spending 9 minutes while the rest is spending 1 minute. I think that it is because one of the reducers has to spend time sorting the results from the rest of nodes. There is a way to reduce this time? Thanks in advance, Raimon Bosch
