Re: MR runner

Frances Perry Tue, 16 Feb 2016 20:31:38 -0800

Beam is by definition a superset of what can be done in a traditional batch
MapReduce-style runner. Although some things like Windowing can be pretty
easily mocked in MapReduce, unbounded input collections would be pretty
hard to implement without implementing a full micro-batch-based streaming
engine. And even then, I'm not sure whether event time / processing time
tradeoffs could be implemented faithfully.

Having said that, we know that not all runners will be able to support the
full Beam model, and we still encourage them anyway ;-) My guess is the
main reason that no one has tried a MapReduce-style runner is that it's a
relatively large amount of work to implement an optimizer to efficiently
transform Beam primitives into MapReduce patterns. (See the FlumeJav
<http://dl.acm.org/citation.cfm?id=1806638>a paper for details on things
like parallel do fusion and combiner lifting.) It's possible there's some
code in Crunch <https://crunch.apache.org/> that could be leveraged, though
I don't know the details. I've gone ahead and created a jira issue
<https://issues.apache.org/jira/browse/BEAM-19> to track interest.

Frances

On Tue, Feb 16, 2016 at 7:00 PM, lonely Feb <[email protected]> wrote:

> I wonder why beam has not consider a traditional MR runner (i.e, hadoop),
> it must be popular in community.
>

Re: MR runner

Reply via email to