The in current framework, each mapper task will create one combiner object
per partition per spill. 

This is very costly, since each time a combiner is created, a new process is
actually created to execute the 

combiner executable. I suspect a job with a stream combiner may not run much
faster than one without it.

It may even be slower. Thus, I doubt the value of supporting such a feature.


I want to know who use stream combiners in real applications and how they
use them. 

Whether these uses can be satisfied by the framework  providing a set of
generic combiners (such as Abacus)?

 

Thoughts?

 

Runping

 

Reply via email to