It's clear now. Thank you very much. 2009/7/7 Owen O'Malley <[email protected]>
> > On Jul 5, 2009, at 11:34 PM, Mu Qiao wrote: > > There is a property min.num.spills.for.combine specifying the minimum >> number of spills to run combiner when merging. The default value is 3. Why >> there is such a restriction? Should it be better that run the combiner no >> matter how many spills there are? >> > > Clearly the combiner isn't useful if there is only 1 spill and 3 is a guess > about how many are necessary before the cost of the applying the combiner is > paid for by the resulting compression. Feel free to set it to 2. > > The second question is why the combiner could be run at the reduce side. >> Can't the reduce function take place of that? >> > > The combiners are only called on the reduce side only if there are enough > spills that it requires more than a single merge before it can go to the > reduce. (The reduce is only called once at the end.) So if the reduce has > 1000 streams to merge, it will use the combiner on the intermediate merges > before they are written to disk. > > -- Owen > -- Best wishes, Qiao Mu MOE KLINNS Lab and SKLMS Lab, Xi'an Jiaotong University Department of Computer Science and Technology, Xi’an Jiaotong University TEL: 15991676983 E-mail: [email protected]
