thanks, let me run more of this with the settings provided later in this thread and provide the details
On Wed, May 9, 2012 at 10:12 PM, Harsh J <ha...@cloudera.com> wrote: > Can you share your job details (or a sample reducer code) and also > share your exact error? > > If you are holding reducer provided values/keys in memory in your > implementation, it can easily cause an OOME if not handled properly. > The reducer by itself does read the values off a sorted file on the > disk and doesn't cache the whole group in memory. > > On Thu, May 10, 2012 at 12:20 AM, Yang <teddyyyy...@gmail.com> wrote: >> it seems that if I put too many records into the same mapper output >> key, all these records are grouped into one key one one reducer, >> >> then the reducer became out of memory. >> >> >> but the reducer interface is: >> >> public void reduce(K key, Iterator<V> values, >> OutputCollector<K, V> output, >> Reporter reporter) >> >> >> so all the values belonging to the key can be iterated, so >> theoretically they can be iterated from disk, and does not have to be >> in memory at the same time, >> so why am I getting out of heap error? is there some param I could >> tune (apart from -Xmx since my box is ultimately bounded in memory >> capacity) >> >> thanks >> Yang > > > > -- > Harsh J