Hi, We are using the old API 0.20.2 of cloudera CDH3. When I have the combiner set (just using the reducer class), it works both in the mapper and reducer. In the mapper, it only aggregate a couple of records a time, while in the reducer, it aggregates 1000 a time. The reducer has some overhead. And this overhead is deteriorated and significant because a mapper task run reducer/combiner as many times as groups (# of different output keys) sequentially. Can I turn it off in mapper while keep it on reducer?
Zhu, Guojun Modeling Sr Graduate 571-3824370 guojun_...@freddiemac.com Financial Engineering Freddie Mac