Reducers are supposed to stream out data. Do not wait for the last
value to arrive in the reduce to put it out in the collector. Call
output.collect on each value.
Sent from my iPhone
On Jan 6, 2011, at 1:26 AM, Dhaval Makawana<[email protected]>
wrote:
Hi,
I have 1 GB of input data with 10954103 records going to 7 reducer
tasks.
The cluster is set up is such that each reducer gets 1 GB of RAM and
256 MB
out of it is reserved for sorting. The reducer simply sums up one
field of
values and outputs all the values along with the sum value. The
problem is
all reducers are getting stuck after around 66% of processing. Tail
of task
tracker log reads as below.
2011-01-06 08:39:31,309 INFO org.apache.hadoop.mapred.ReduceTask:
Interleaved on-disk merge complete: 0 files left.
2011-01-06 08:39:31,309 INFO org.apache.hadoop.mapred.ReduceTask: In-
memory
merge complete: 2 files left.
2011-01-06 08:39:31,333 INFO org.apache.hadoop.mapred.Merger:
Merging 2
sorted segments
2011-01-06 08:39:31,334 INFO org.apache.hadoop.mapred.Merger: Down
to the
last merge-pass, with 1 segments left of total size: 25352535 bytes
2011-01-06 08:39:31,343 INFO
org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor
2011-01-06 08:39:31,704 INFO org.apache.hadoop.mapred.ReduceTask:
Merged 2
segments, 25352537 bytes to disk to satisfy reduce memory limit
2011-01-06 08:39:31,704 INFO org.apache.hadoop.mapred.ReduceTask:
Merging 1
files, 2947959 bytes from disk
2011-01-06 08:39:31,705 INFO org.apache.hadoop.mapred.ReduceTask:
Merging 0
segments, 0 bytes from memory into reduce
2011-01-06 08:39:31,705 INFO org.apache.hadoop.mapred.Merger:
Merging 1
sorted segments
2011-01-06 08:39:31,709 INFO org.apache.hadoop.mapred.Merger: Down
to the
last merge-pass, with 1 segments left of total size: 2947955 bytes
Please help solving the problem.
Regards,
Dhaval