Re: Beam spark 2.x runner status

Ismaël Mejía Wed, 15 Mar 2017 07:26:20 -0700

BIG +1 JB,

If we can just jump the version number with minor changes staying as
close as possible to the current implementation for spark 1 we can go
faster and offer in principle the exact same support but for version
2.


I know that the advanced streaming stuff based on the DataSet API
won't be there but with this common canvas the community can iterate
to create a DataSet based translator at the same time. In particular I
consider the most important thing is that the spark 2 branch should
not live for long time, this should be merged into master really fast
for the benefit of everybody.

Ismaël


On Wed, Mar 15, 2017 at 1:57 PM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
> Hi Amit,
>
> What do you think of the following:
>
> - in the mean time that you reintroduce the Spark 2 branch, what about
> "extending" the version in the current Spark runner ? Still using
> RDD/DStream, I think we can support Spark 2.x even if we don't yet leverage
> the new provided features.
>
> Thoughts ?
>
> Regards
> JB
>
>
> On 03/15/2017 07:39 PM, Amit Sela wrote:
>>
>> Hi Cody,
>>
>> I will re-introduce this branch soon as part of the work on BEAM-913
>> <https://issues.apache.org/jira/browse/BEAM-913>.
>> For now, and from previous experience with the mentioned branch, batch
>> implementation should be straight-forward.
>> Only issue is with streaming support - in the current runner (Spark 1.x)
>> we
>> have experimental support for windows/triggers and we're working towards
>> full streaming support.
>> With Spark 2.x, there is no "general-purpose" stateful operator for the
>> Dataset API, so I was waiting to see if the new operator
>> <https://github.com/apache/spark/pull/17179> planned for next version
>> could
>> help with that.
>>
>> To summarize, I will introduce a skeleton for the Spark 2 runner with
>> batch
>> support as soon as I can as a separate branch.
>>
>> Thanks,
>> Amit
>>
>> On Wed, Mar 15, 2017 at 9:07 AM Cody Innowhere <e.neve...@gmail.com>
>> wrote:
>>
>>> Hi guys,
>>> Is there anybody who's currently working on Spark 2.x runner? A old PR
>>> for
>>> spark 2.x runner was closed a few days ago, so I wonder what's the status
>>> now, and is there a roadmap for this?
>>> Thanks~
>>>
>>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com

Re: Beam spark 2.x runner status

Reply via email to