Reducers are supposed to stream out data. Do not wait for the last value to arrive in the reduce to put it out in the collector. Call output.collect on each value.

Sent from my iPhone

On Jan 6, 2011, at 1:26 AM, Dhaval Makawana<[email protected]> wrote:

Hi,



I have 1 GB of input data with 10954103 records going to 7 reducer tasks. The cluster is set up is such that each reducer gets 1 GB of RAM and 256 MB out of it is reserved for sorting. The reducer simply sums up one field of values and outputs all the values along with the sum value. The problem is all reducers are getting stuck after around 66% of processing. Tail of task
tracker log reads as below.



2011-01-06 08:39:31,309 INFO org.apache.hadoop.mapred.ReduceTask:
Interleaved on-disk merge complete: 0 files left.

2011-01-06 08:39:31,309 INFO org.apache.hadoop.mapred.ReduceTask: In- memory
merge complete: 2 files left.

2011-01-06 08:39:31,333 INFO org.apache.hadoop.mapred.Merger: Merging 2
sorted segments

2011-01-06 08:39:31,334 INFO org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 1 segments left of total size: 25352535 bytes

2011-01-06 08:39:31,343 INFO org.apache.hadoop.io.compress.CodecPool: Got
brand-new compressor

2011-01-06 08:39:31,704 INFO org.apache.hadoop.mapred.ReduceTask: Merged 2
segments, 25352537 bytes to disk to satisfy reduce memory limit

2011-01-06 08:39:31,704 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1
files, 2947959 bytes from disk

2011-01-06 08:39:31,705 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0
segments, 0 bytes from memory into reduce

2011-01-06 08:39:31,705 INFO org.apache.hadoop.mapred.Merger: Merging 1
sorted segments

2011-01-06 08:39:31,709 INFO org.apache.hadoop.mapred.Merger: Down to the
last merge-pass, with 1 segments left of total size: 2947955 bytes


Please help solving the problem.


Regards,

Dhaval

Reply via email to