thanks, let me try this
On Wed, May 9, 2012 at 11:27 PM, Zizon Qiu <zzd...@gmail.com> wrote: > try setting a lower value for mapred.job.shuffle.input.buffer.percent . > the reducer used it to decide whether use in-memory shuffle. > the default value is 0.7,meaning 70% of the "memory" are used as shuffle > buffer. > > On Thu, May 10, 2012 at 2:50 AM, Yang <teddyyyy...@gmail.com> wrote: > >> it seems that if I put too many records into the same mapper output >> key, all these records are grouped into one key one one reducer, >> >> then the reducer became out of memory. >> >> >> but the reducer interface is: >> >> public void reduce(K key, Iterator<V> values, >> OutputCollector<K, V> output, >> Reporter reporter) >> >> >> so all the values belonging to the key can be iterated, so >> theoretically they can be iterated from disk, and does not have to be >> in memory at the same time, >> so why am I getting out of heap error? is there some param I could >> tune (apart from -Xmx since my box is ultimately bounded in memory >> capacity) >> >> thanks >> Yang >>