Re: questions about the order of the map and reduce and the shuffle error

Harsh J Fri, 27 Aug 2010 02:54:04 -0700

On Fri, Aug 27, 2010 at 3:13 PM, xu cheng <[email protected]> wrote:
> hello guys:
>   I'm doing some experiences on my 3 node virtual machine cluster, one for
> namenode and jobtracker while the other tow for datanode and
> tasktracker.with a 0.21.0 hadoop
>   and when  I 'm running a job ,I got such message
>
>
> 10/08/27 17:28:58 INFO mapreduce.Job:  map 0% reduce 0%
> 10/08/27 17:29:10 INFO mapreduce.Job:  map 50% reduce 0%
> 10/08/27 17:29:12 INFO mapreduce.Job:  map 83% reduce 0%
> 10/08/27 17:29:19 INFO mapreduce.Job:  map 83% reduce 16%
> 10/08/27 17:29:24 INFO mapreduce.Job:  map 100% reduce 16%
>
>
> the reduce runs while the map task hasn't finished!! ( I read from the books
> that reduce task runs exactlly after the maps finish!!)


The reducer operation contains 3 steps - copy/sort/reduce. 33%-33%-33%
roughly. What you're seeing is the copy phase progress at this point.
The 'reduce' phase indeed runs only after all maps finish, that is
true.

>
> is there something wrong with the cluster or my knowledge?
>
> by the way ,the reduce job stuck while it is in the progress 16.63%, I found
> that people on the internet also got this problem but I haven't found the
> solution.
>
> however, after some time , after the system report the error message , the
> job began to run again! like this
>
>
> 10/08/27 17:29:58 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:30:01 INFO mapreduce.Job:  map 50% reduce 16%
> 10/08/27 17:30:07 INFO mapreduce.Job:  map 83% reduce 16%
> 10/08/27 17:30:19 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:30:25 INFO mapreduce.Job:  map 100% reduce 66%
> 10/08/27 17:30:31 INFO mapreduce.Job:  map 100% reduce 100%
>
>
> and it runs pretty well!  does someone know about this?
>
> belows are the message on the console, If the logs are needed ,let me
> know.thanks
>
> any suggestions and references are appreciated
> best regards
> xu
>
>
>
Are you running a Firewall, and if you are - is it configured properly?
>
>
>
>
>
> 10/08/27 17:28:56 INFO security.Groups: Group mapping
> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
> cacheTimeout=300000
> 10/08/27 17:28:56 WARN conf.Configuration: mapred.task.id is deprecated.
> Instead, use mapreduce.task.attempt.id
> 10/08/27 17:28:56 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 10/08/27 17:28:56 INFO input.FileInputFormat: Total input paths to process :
> 1
> 10/08/27 17:28:56 WARN conf.Configuration: mapred.map.tasks is deprecated.
> Instead, use mapreduce.job.maps
> 10/08/27 17:28:56 INFO mapreduce.JobSubmitter: number of splits:2
> 10/08/27 17:28:57 INFO mapreduce.JobSubmitter: adding the following
> namenodes' delegation tokens:null
> 10/08/27 17:28:57 INFO mapreduce.Job: Running job: job_201008271725_0001
> 10/08/27 17:28:58 INFO mapreduce.Job:  map 0% reduce 0%
> 10/08/27 17:29:10 INFO mapreduce.Job:  map 50% reduce 0%
> 10/08/27 17:29:12 INFO mapreduce.Job:  map 83% reduce 0%
> 10/08/27 17:29:19 INFO mapreduce.Job:  map 83% reduce 16%
> 10/08/27 17:29:24 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:29:48 INFO mapreduce.Job: Task Id :
> attempt_201008271725_0001_r_000000_0, Status : FAILED
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in
> shuffle in fetcher#1
>  at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
>  at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
>  at org.apache.hadoop.mapred.Child.main(Child.java:211)
> Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES;
> bailing-out.
>  at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
>  at
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
>  at
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:234)
>  at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)
> 10/08/27 17:29:48 WARN mapreduce.Job: Error reading task outputConnection
> refused
> 10/08/27 17:29:48 WARN mapreduce.Job: Error reading task outputConnection
> refused
> 10/08/27 17:29:49 INFO mapreduce.Job:  map 100% reduce 0%
> 10/08/27 17:29:57 INFO mapreduce.Job: Task Id :
> attempt_201008271725_0001_m_000000_0, Status : FAILED
> Too many fetch-failures
> 10/08/27 17:29:57 WARN mapreduce.Job: Error reading task outputConnection
> refused
> 10/08/27 17:29:57 WARN mapreduce.Job: Error reading task outputConnection
> refused
> 10/08/27 17:29:58 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:30:01 INFO mapreduce.Job:  map 50% reduce 16%
> 10/08/27 17:30:07 INFO mapreduce.Job:  map 83% reduce 16%
> 10/08/27 17:30:19 INFO mapreduce.Job:  map 100% reduce 16%
> 10/08/27 17:30:25 INFO mapreduce.Job:  map 100% reduce 66%
> 10/08/27 17:30:31 INFO mapreduce.Job:  map 100% reduce 100%
> 10/08/27 17:30:33 INFO mapreduce.Job: Job complete: job_201008271725_0001
> 10/08/27 17:30:33 INFO mapreduce.Job: Counters: 33
>  FileInputFormatCounters
>  BYTES_READ=76420532
>  FileSystemCounters
>  FILE_BYTES_READ=155979268
>  FILE_BYTES_WRITTEN=239598906
>  HDFS_BYTES_READ=76424828
>  HDFS_BYTES_WRITTEN=78386951
>  Shuffle Errors
>  BAD_ID=0
>  CONNECTION=0
>  IO_ERROR=2
>  WRONG_LENGTH=0
>  WRONG_MAP=0
>  WRONG_REDUCE=0
>  Job Counters
>  Data-local map tasks=3
>  Total time spent by all maps waiting after reserving slots (ms)=0
>  Total time spent by all reduces waiting after reserving slots (ms)=0
>  SLOTS_MILLIS_MAPS=47747
>  SLOTS_MILLIS_REDUCES=77241
>  Launched map tasks=3
>  Launched reduce tasks=2
>  Map-Reduce Framework
>  Combine input records=999998
>  Combine output records=994179
>  Failed Shuffles=1
>  GC time elapsed (ms)=667
>  Map input records=499999
>  Map output bytes=80759850
>  Map output records=999998
>  Merged Map outputs=2
>  Reduce input groups=993814
>  Reduce input records=994179
>  Reduce output records=993814
>  Reduce shuffle bytes=83049802
>  Shuffled Maps =2
>  Spilled Records=2861309
>  SPLIT_RAW_BYTES=200
>



-- 
Harsh J
www.harshj.com

Re: questions about the order of the map and reduce and the shuffle error

Reply via email to