On Tue, 2009-03-10 at 19:44 -0700, Gyanit wrote: > I have large number of key,value pairs. I don't actually care if data goes in > value or key. Let me be more exact. > (k,v) pair after combiner is about 1 mil. I have approx 1kb data for each > pair. I can put it in keys or values. > I have experimented with both options (heavy key , light value) vs (light > key, heavy value). It turns out that hk,lv option is much much better than > (lk,hv). <snip> > There is a difference of time in shuffle phase. Which is weird as amount of > data transferred is same.
just an idea, but is this related to the hash function? are there the same number of reducers no matter which you do? As I understand it the reducers merge-sort the data while the shuffle is happening, so if each reducer has to sort less data this could be part of it.
