[
https://issues.apache.org/jira/browse/MAPREDUCE-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Joseph Evans updated MAPREDUCE-2724:
-------------------------------------------
Status: Open (was: Patch Available)
I am not sure if 0.21 is really supported any more. In either case you will
need to post test results for your patch.
> Reducer fetcher synchronize problem
> -----------------------------------
>
> Key: MAPREDUCE-2724
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2724
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: task
> Affects Versions: 0.21.0
> Environment: ubuntu 10.10
> m64
> Reporter: ShengpengYu
> Fix For: 0.21.0
>
> Attachments: patch.txt
>
>
> I've recently using hadoop(version 0.21.0) for some data processing, but
> sometimes reducer crashed. Always the log is like bellow, which tells when
> multi fetchers recieved mapoutput simultaneously, the merge part may fail due
> to some disadvantages. To verify this assumption, I then set the number of
> fetchers to 1 (mapreduce.reduce.shuffle.parallelcopies), after which the
> problem dispeared and the job works well.
> 2011-07-20 18:56:34,999 INFO
> org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
> attempt_201107140944_0086_r_000000_0: Got 2 new map-outputs
> 2011-07-20 18:56:35,000 INFO
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging
> dc-0005:50060 with 1 to fetcher#1
> 2011-07-20 18:56:35,000 INFO
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to
> dc-0005:50060 to fetcher#1
> 2011-07-20 18:56:35,000 INFO
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging
> dc-0002:50060 with 1 to fetcher#3
> 2011-07-20 18:56:35,000 INFO
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to
> dc-0002:50060 to fetcher#3
> 2011-07-20 18:56:35,006 INFO org.apache.hadoop.mapreduce.task.reduce.Fetcher:
> for
> url=50060/mapOutput?job=job_201107140944_0086&reduce=0&map=attempt_201107140944_0086_m_000498_0
> sent hash and receievd reply
> 2011-07-20 18:56:35,016 INFO org.apache.hadoop.mapreduce.task.reduce.Fetcher:
> for
> url=50060/mapOutput?job=job_201107140944_0086&reduce=0&map=attempt_201107140944_0086_m_000491_0
> sent hash and receievd reply
> 2011-07-20 18:56:35,056 INFO org.apache.hadoop.mapreduce.task.reduce.Fetcher:
> fetcher#3 about to shuffle output of map attempt_201107140944_0086_m_000498_0
> decomp: 12647556 len: 12647560 to MEMORY
> 2011-07-20 18:56:35,070 INFO org.apache.hadoop.mapreduce.task.reduce.Fetcher:
> fetcher#1 about to shuffle output of map attempt_201107140944_0086_m_000491_0
> decomp: 30172760 len: 30172764 to MEMORY
> 2011-07-20 18:56:35,553 INFO org.apache.hadoop.mapreduce.task.reduce.Fetcher:
> Read 12647556 bytes from map-output for attempt_201107140944_0086_m_000498_0
> 2011-07-20 18:56:35,553 INFO
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile ->
> map-output of size: 12647556, inMemoryMapOutputs.size() -> 25
> 2011-07-20 18:56:35,553 INFO
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: Starting
> inMemoryMerger's merge since usedMemory=479526428 > mergeThreshold=440963456
> 2011-07-20 18:56:35,553 INFO
> org.apache.hadoop.mapreduce.task.reduce.MergeThread: InMemoryMerger - Thread
> to merge in-memory shuffled map-outputs: Starting merge with 25 segments,
> while ignoring 0 segments
> 2011-07-20 18:56:35,554 INFO
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: dc-0002:50060 freed
> by fetcher#3 in 554s
> 2011-07-20 18:56:35,556 INFO
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: Initiating in-memory
> merge with 25 segments...
> 2011-07-20 18:56:35,557 INFO org.apache.hadoop.mapred.Merger: Merging 25
> sorted segments
> 2011-07-20 18:56:35,557 INFO org.apache.hadoop.mapred.Merger: Down to the
> last merge-pass, with 25 segments left of total size: 449352783 bytes
> 2011-07-20 18:56:35,696 INFO org.apache.hadoop.mapreduce.task.reduce.Fetcher:
> Read 30172760 bytes from map-output for attempt_201107140944_0086_m_000491_0
> 2011-07-20 18:56:35,696 INFO
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile ->
> map-output of size: 30172760, inMemoryMapOutputs.size() -> 1
> 2011-07-20 18:56:35,696 INFO
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: Starting
> inMemoryMerger's merge since usedMemory=479526428 > mergeThreshold=440963456
> 2011-07-20 18:56:35,696 INFO
> org.apache.hadoop.mapreduce.task.reduce.MergeThread: InMemoryMerger - Thread
> to merge in-memory shuffled map-outputs: Starting merge with 1 segments,
> while ignoring 0 segments
> 2011-07-20 18:56:35,696 INFO
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: dc-0005:50060 freed
> by fetcher#1 in 696s
> 2011-07-20 18:56:41,540 WARN org.apache.hadoop.mapred.Child: Exception
> running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError:
> error in shuffle in InMemoryMerger - Thread to merge in-memory shuffled
> map-outputs
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
> at org.apache.hadoop.mapred.Child.main(Child.java:211)
> Caused by: java.lang.RuntimeException: java.io.EOFException
> at
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:132)
> at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:530)
> at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:141)
> at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:108)
> at
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:478)
> at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:493)
> at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:199)
> at
> org.apache.hadoop.mapreduce.task.reduce.MergeManager$InMemoryMerger.merge(MergeManager.java:443)
> at
> org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:89)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at java.io.DataInputStream.readFully(DataInputStream.java:152)
> at com.iflytek.hadoop.streaming.io.BaseKey32.readFields(BaseKey32.java:24)
> at
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:129)
> ... 8 more
> 2011-07-20 18:56:41,552 INFO org.apache.hadoop.mapred.Task: Runnning cleanup
> for the task
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira