try setting a lower value for mapred.job.shuffle.input.buffer.percent . the reducer used it to decide whether use in-memory shuffle. the default value is 0.7,meaning 70% of the "memory" are used as shuffle buffer.
On Thu, May 10, 2012 at 2:50 AM, Yang <teddyyyy...@gmail.com> wrote: > it seems that if I put too many records into the same mapper output > key, all these records are grouped into one key one one reducer, > > then the reducer became out of memory. > > > but the reducer interface is: > > public void reduce(K key, Iterator<V> values, > OutputCollector<K, V> output, > Reporter reporter) > > > so all the values belonging to the key can be iterated, so > theoretically they can be iterated from disk, and does not have to be > in memory at the same time, > so why am I getting out of heap error? is there some param I could > tune (apart from -Xmx since my box is ultimately bounded in memory > capacity) > > thanks > Yang >