The combiner runs when it is spilling the intermediate output to disk. So the flow looks like:
in map: map writes into buffer when buffer is "full" do a quick sort, combine and write to disk merge sort the partial outputs from disk, combine and write to disk in reduce: fetch output from maps into buffer when buffer is "full" do a merge sort, combine and write to disk merge sort the partial outputs and feed to the reduce So you'll have as many combines in general as the framework needs to spill to disk. It all depends on the data sizes. The 0 time case is rare, but it is if a partition has a single value in it (because it is very very large). -- Owen
