I'm trying to figure out why, when I use a single input file that is relatively
small (< 3 MB, ~95K records), my Reducer hangs at 33% and the attempt never
completes, with the output below. It looks like there's a problem when it tries
to write to the log file, for some reason it gets Interrupted which is not
handled, and the Thread then terminates without ever catching that Exception
and so the Reducer Task sits there until the JobTracker kills it and starts up
another one. Which inevitably hits the same one.
When I take that same data, and split it up into multiple files (5, 10,
whichever, just >1), then it runs fine, no issues. Very strange, anyone have
any ideas?
--Aaron
stderr logs
log4j:ERROR Failed to flush writer,
java.io.InterruptedIOException
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:260)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:58)
at
org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:316)
at org.apache.log4j.WriterAppender.append(WriterAppender.java:160)
at
org.apache.hadoop.mapred.TaskLogAppender.append(TaskLogAppender.java:58)
at
org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at
org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:199)
at
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.freeHost(ShuffleScheduler.java:345)
at
org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152)
syslog logs
2011-03-15 17:02:32,258 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2011-03-15 17:02:32,265 WARN org.apache.hadoop.conf.Configuration: user.name is
deprecated. Instead, use mapreduce.job.user.name
2011-03-15 17:02:32,360 WARN org.apache.hadoop.conf.Configuration:
mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
2011-03-15 17:02:32,413 INFO
org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager:
memoryLimit=130514944, maxSingleShuffleLimit=32628736, mergeThreshold=86139864,
ioSortFactor=10, memToMemMergeOutputsThreshold=10
2011-03-15 17:02:32,416 INFO
org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
attempt_201103141158_0104_r_000000_0 Thread started: EventFetcher for fetching
Map Completion Events
2011-03-15 17:02:32,423 INFO
org.apache.hadoop.mapreduce.task.reduce.EventFetcher:
attempt_201103141158_0104_r_000000_0: Got 1 new map-outputs
2011-03-15 17:02:32,423 INFO
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging
10.10.11.50:50060 with 1 to fetcher#5
2011-03-15 17:02:32,423 INFO
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to
10.10.11.50:50060 to fetcher#5
2011-03-15 17:02:32,730 INFO org.apache.hadoop.mapreduce.task.reduce.Fetcher:
for
url=50060/mapOutput?job=job_201103141158_0104&reduce=0&map=attempt_201103141158_0104_m_000000_0
sent hash and receievd reply
2011-03-15 17:02:32,735 INFO org.apache.hadoop.mapreduce.task.reduce.Fetcher:
fetcher#5 about to shuffle output of map attempt_201103141158_0104_m_000000_0
decomp: 2050216 len: 2050220 to MEMORY
2011-03-15 17:02:32,790 INFO org.apache.hadoop.mapreduce.task.reduce.Fetcher:
Read 2050216 bytes from map-output for attempt_201103141158_0104_m_000000_0
2011-03-15 17:02:32,790 INFO
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile ->
map-output of size: 2050216, inMemoryMapOutputs.size() -> 1