Hi all, Today there's a problem about imbalanced data come out of mind .
I'd like to know how hadoop handle this kind of data. e.g. one key dominates the map output, say 99%. So 99% data set will go to one reducer, and this reducer will become the bottleneck. Does hadoop have any other better ways to handle such imbalanced data set ? Jeff Zhang