[jira] [Commented] (BEAM-6296) Support Python Spark Runner

JIRA Thu, 27 Dec 2018 06:47:31 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-6296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729654#comment-16729654
 ]


Ismaël Mejía commented on BEAM-6296:
------------------------------------

Hi, thanks for the suggestion, this is in the agenda for next year. So far we 
are working on a rewrite of the Spark runner based in the Dataset API soon 
after that the idea is to focus in the portable translation, you can follow the 
progress here BEAM-2891 or if you want to be part of the action ping us at 
#beam-spark in Slack.


> Support Python Spark Runner
> ---------------------------
>
>                 Key: BEAM-6296
>                 URL: https://issues.apache.org/jira/browse/BEAM-6296
>             Project: Beam
>          Issue Type: New Feature
>          Components: runner-spark
>    Affects Versions: 2.9.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Amit Sela
>            Priority: Major
>              Labels: python
>             Fix For: Not applicable
>
>
> Hello, everyone,
> It would be great to have a Python version of Spark runner available to 
> Python. 
> While we are happy of running Apache Beam on Dataflow, there are a few use 
> cases that require different dependencies and OS env which makes it be more 
> appropriate to run on a self-managed Spark cluster. With a spark runner for 
> the python SDK, there will be an option to unify the language to define data 
> pipelines.  
> Would like to see the community's feedbacks of this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (BEAM-6296) Support Python Spark Runner

Reply via email to