Hi, everyone I've been learning hadoop recently and I'm confused about the combiner mechanism.
There is a property min.num.spills.for.combine specifying the minimum number of spills to run combiner when merging. The default value is 3. Why there is such a restriction? Should it be better that run the combiner no matter how many spills there are? The second question is why the combiner could be run at the reduce side. Can't the reduce function take place of that? Thanks very much. -- Best wishes, 乔木 MOE KLINNS Lab and SKLMS Lab, Xi'an Jiaotong University Department of Computer Science and Technology, Xi’an Jiaotong University TEL: 15991676983 E-mail: [email protected]
