John, that sounds very interesting, and I may implement such a workflow, but can I write back to HDFS in the mapper? In the reducer it is a standard context.write(), but it is a different context.
Thank you, Mark On Mon, Jun 18, 2012 at 9:24 AM, John Armstrong <j...@ccri.com> wrote: > On 06/18/2012 10:19 AM, Mark Kerzner wrote: > >> If only reducers could be told to start their work on the first >> maps that they see, my processing would begin to show results much >> earlier, >> before all the mappers are done. >> > > The sort/shuffle phase isn't just about ordering the keys, it's about > collecting all the results of the map phase that share a key together for > the reducers to work on. If your reducer can operate on mapper outputs > independently of each other, then it sounds like it's really another mapper > and should be either factored into the mapper or rewritten as a mapper on > its own and both mappers thrown into the ChainMapper (if you're using the > older API). >