[ 
https://issues.apache.org/jira/browse/HADOOP-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522843
 ] 

Devaraj Das commented on HADOOP-1764:
-------------------------------------

bq.  Given that I'm inclined to close this bug... thoughts?

+1

> Inconsistancy between Mapper/Reducer book keeping
> -------------------------------------------------
>
>                 Key: HADOOP-1764
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1764
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>         Environment: Related: HADOOP-1763 (Same environment)
> Version: 0.15.0-dev, r565628
> Compiled: Tue Aug 14 20:55:37 UTC 2007 by hadoopqa
> 1400 Nodes
>            Reporter: Srikanth Kakani
>            Assignee: Arun C Murthy
>            Priority: Blocker
>
> Refer to HADOOP-1763
> This occurs in that scenario once many job trackers are lost, reducers do not 
> know where the map outputs are present. They keep retrying the wrong node 
> causing the reducers to run forever without failures.
> Relevant logs:
> Reducer output:
> 2007-08-21 09:47:47,046 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200708210155_0003_r_000006_2 Copying task_200708210155_0003_m_002598_0 
> output from node50
> 2007-08-21 09:47:53,643 WARN org.apache.hadoop.mapred.ReduceTask: 
> task_200708210155_0003_r_000006_2 copy failed: 
> task_200708210155_0003_m_002598_0 from node50
> 2007-08-21 09:47:53,643 WARN org.apache.hadoop.mapred.ReduceTask: 
> java.io.FileNotFoundException: 
> http://wm511750.inktomisearch.com:50060/mapOutput?map=task_200708210155_0003_m_002598_0&reduce=6
>       at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1243)
>       at 
> org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:207)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:673)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:631)
> 2007-08-21 09:53:02,327 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200708210155_0003_r_000006_2 Copying task_200708210155_0003_m_002598_0 
> output from node50
> 2007-08-21 09:53:02,333 WARN org.apache.hadoop.mapred.ReduceTask: 
> task_200708210155_0003_r_000006_2 copy failed: 
> task_200708210155_0003_m_002598_0 from node50
> 2007-08-21 09:53:02,333 WARN org.apache.hadoop.mapred.ReduceTask: 
> java.io.FileNotFoundException: 
> http://node50:50060/mapOutput?map=task_200708210155_0003_m_002598_0&reduce=6
>       at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1243)
>       at 
> org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:207)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:673)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:631)
> 2007-08-21 09:57:33,899 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200708210155_0003_r_000006_2 Copying task_200708210155_0003_m_002598_0 
> output from node50.inktomisearch.com.
> 2007-08-21 09:57:33,908 WARN org.apache.hadoop.mapred.ReduceTask: 
> task_200708210155_0003_r_000006_2 copy failed: 
> task_200708210155_0003_m_002598_0 from node50.inktomisearch.com
> 2007-08-21 09:57:33,908 WARN org.apache.hadoop.mapred.ReduceTask: 
> java.io.FileNotFoundException: 
> http://node50:50060/mapOutput?map=task_200708210155_0003_m_002598_0&reduce=6
>       at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1243)
>       at 
> org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:207)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:673)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:631)
> 2007-08-21 10:00:56,337 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200708210155_0003_r_000006_2 Copying task_200708210155_0003_m_002598_1 
> output from node75.inktomisearch.com.
> 2007-08-21 10:00:56,342 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200708210155_0003_r_000006_2 done copying 
> task_200708210155_0003_m_002598_1 output from node75
> 2007-08-21 10:02:17,486 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200708210155_0003_r_000006_2 Ignoring obsolete copy result for Map Task: 
> task_200708210155_0003_m_002598_0 from host: node50
> Looking at TIP task_200708210155_0003_m_002598:
> task_200708210155_0003_m_002598_0     node50  KILLED  0.00%           
> 21-Aug-2007 09:38:49    Lost task tracker
> task_200708210155_0003_m_002598_1     node75  KILLED  0.00%           
> 21-Aug-2007 11:22:42    Lost task tracker
> task_200708210155_0003_m_002598_2     node55  SUCCEEDED       100.00% 
> 21-Aug-2007 11:22:46    21-Aug-2007 11:27:19 (4mins, 33sec)     
> task_200708210155_0003_m_002598_3     node49  KILLED  100.00% 21-Aug-2007 
> 11:22:48    21-Aug-2007 11:27:48 (4mins, 59sec)     Already completed TIP
> Notes:
> 1. Even finally the reducer seems to fetch data from the incorrect 
> TaskTracker, it is not checking with the job tracker for the final/correct 
> map output
> 2. It seems to retry more times and sleeps for longer time (looking at the 
> interval of log messages)
> 3. An obvious solution may be to go to the job tracker and directly get the 
> correct map output (I was able to get the correct map output from node55 
> using http, without any errors)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to