[jira] Updated: (HADOOP-3332) improving the logging during shuffling

Runping Qi (JIRA) Wed, 07 May 2008 22:51:19 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Runping Qi updated HADOOP-3332:
-------------------------------

    Description: 
Below is an excerpt from the log file of a reducer. 
A same set of of messages about fetching schedule is logged every second. 
Yet, the critical information --- which hosts were slow --- was not there. 

  
2008-05-01 00:33:13,215 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is 
already in progress 
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete 
map-outputs from tasktracker and 0 map-outputs from previous failures 
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Got 2 known map output location(s); 
scheduling... 
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Scheduled 0 of 2 known outputs (2 slow hosts 
and 0 dup hosts) 
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is 
already in progress 
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete 
map-outputs from tasktracker and 0 map-outputs from previous failures 
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Got 2 known map output location(s); 
scheduling... 
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Scheduled 0 of 2 known outputs (2 slow hosts 
and 0 dup hosts) 
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is 
already in progress 
2008-05-01 00:33:16,218 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete 
map-outputs from tasktracker and 0 map-outputs from previous failures 


  was:

Below is an excerpt from the log file of a reducer. 
A same set of of messages about fetching schedule is logged every second. 
Yet, the critical information --- which hosts were slow --- was not there. 

  
2008-05-01 00:33:13,215 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is 
already in progress 
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete 
map-outputs from tasktracker and 0 map-outputs from previous failures 
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Got 2 known map output location(s); 
scheduling... 
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Scheduled 0 of 2 known outputs (2 slow hosts 
and 0 dup hosts) 
2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is 
already in progress 
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete 
map-outputs from tasktracker and 0 map-outputs from previous failures 
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Got 2 known map output location(s); 
scheduling... 
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Scheduled 0 of 2 known outputs (2 slow hosts 
and 0 dup hosts) 
2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is 
already in progress 
2008-05-01 00:33:16,218 INFO org.apache.hadoop.mapred.ReduceTask: 
task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete 
map-outputs from tasktracker and 0 map-outputs from previous failures 


       Priority: Critical  (was: Major)

It turns out that this problem is much more severe that it looks initially.
For any reasonable size of jobs where the shuffling may take some time, the 
userlog/syslog file of each reducer task may 
reach unreasonably large (0.5GB, say). This may impose a big burden for hod to 
harvest the log files when deallocating 
a cluster. Also, if  those log files are archived on a DFS (as what the hod 
does now), the space requirements on DFS 
will be quite significant.


> improving the logging during shuffling
> --------------------------------------
>
>                 Key: HADOOP-3332
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3332
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Runping Qi
>            Priority: Critical
>
> Below is an excerpt from the log file of a reducer. 
> A same set of of messages about fetching schedule is logged every second. 
> Yet, the critical information --- which hosts were slow --- was not there. 
>   
> 2008-05-01 00:33:13,215 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is 
> already in progress 
> 2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures 
> 2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804302255_0002_r_000720_0 Got 2 known map output location(s); 
> scheduling... 
> 2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804302255_0002_r_000720_0 Scheduled 0 of 2 known outputs (2 slow 
> hosts and 0 dup hosts) 
> 2008-05-01 00:33:14,216 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is 
> already in progress 
> 2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures 
> 2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804302255_0002_r_000720_0 Got 2 known map output location(s); 
> scheduling... 
> 2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804302255_0002_r_000720_0 Scheduled 0 of 2 known outputs (2 slow 
> hosts and 0 dup hosts) 
> 2008-05-01 00:33:15,217 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804302255_0002_r_000720_0 Need another 3 map output(s) where 1 is 
> already in progress 
> 2008-05-01 00:33:16,218 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804302255_0002_r_000720_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3332) improving the logging during shuffling

Reply via email to