The in current framework, each mapper task will create one combiner object per partition per spill.
This is very costly, since each time a combiner is created, a new process is actually created to execute the combiner executable. I suspect a job with a stream combiner may not run much faster than one without it. It may even be slower. Thus, I doubt the value of supporting such a feature. I want to know who use stream combiners in real applications and how they use them. Whether these uses can be satisfied by the framework providing a set of generic combiners (such as Abacus)? Thoughts? Runping