How to handle imbalanced data in hadoop ?

Jeff Zhang Sat, 14 Nov 2009 20:04:05 -0800

Hi all,

Today there's a problem about imbalanced data come out of mind .


I'd like to know how hadoop handle this kind of data.  e.g. one key
dominates the map output, say 99%. So 99% data set will go to one reducer,
and this reducer will become the bottleneck.

Does hadoop have any other better ways to handle such imbalanced data set ?


Jeff Zhang

How to handle imbalanced data in hadoop ?

Reply via email to