Beam is by definition a superset of what can be done in a traditional batch MapReduce-style runner. Although some things like Windowing can be pretty easily mocked in MapReduce, unbounded input collections would be pretty hard to implement without implementing a full micro-batch-based streaming engine. And even then, I'm not sure whether event time / processing time tradeoffs could be implemented faithfully.
Having said that, we know that not all runners will be able to support the full Beam model, and we still encourage them anyway ;-) My guess is the main reason that no one has tried a MapReduce-style runner is that it's a relatively large amount of work to implement an optimizer to efficiently transform Beam primitives into MapReduce patterns. (See the FlumeJav <http://dl.acm.org/citation.cfm?id=1806638>a paper for details on things like parallel do fusion and combiner lifting.) It's possible there's some code in Crunch <https://crunch.apache.org/> that could be leveraged, though I don't know the details. I've gone ahead and created a jira issue <https://issues.apache.org/jira/browse/BEAM-19> to track interest. Frances On Tue, Feb 16, 2016 at 7:00 PM, lonely Feb <[email protected]> wrote: > I wonder why beam has not consider a traditional MR runner (i.e, hadoop), > it must be popular in community. >
