Hi all,

Today there's a problem about imbalanced data come out of mind .

I'd like to know how hadoop handle this kind of data.  e.g. one key
dominates the map output, say 99%. So 99% data set will go to one reducer,
and this reducer will become the bottleneck.

Does hadoop have any other better ways to handle such imbalanced data set ?


Jeff Zhang

Reply via email to