[
https://issues.apache.org/jira/browse/BEAM-6296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729654#comment-16729654
]
Ismaël Mejía commented on BEAM-6296:
------------------------------------
Hi, thanks for the suggestion, this is in the agenda for next year. So far we
are working on a rewrite of the Spark runner based in the Dataset API soon
after that the idea is to focus in the portable translation, you can follow the
progress here BEAM-2891 or if you want to be part of the action ping us at
#beam-spark in Slack.
> Support Python Spark Runner
> ---------------------------
>
> Key: BEAM-6296
> URL: https://issues.apache.org/jira/browse/BEAM-6296
> Project: Beam
> Issue Type: New Feature
> Components: runner-spark
> Affects Versions: 2.9.0
> Reporter: Lei (Eddy) Xu
> Assignee: Amit Sela
> Priority: Major
> Labels: python
> Fix For: Not applicable
>
>
> Hello, everyone,
> It would be great to have a Python version of Spark runner available to
> Python.
> While we are happy of running Apache Beam on Dataflow, there are a few use
> cases that require different dependencies and OS env which makes it be more
> appropriate to run on a self-managed Spark cluster. With a spark runner for
> the python SDK, there will be an option to unify the language to define data
> pipelines.
> Would like to see the community's feedbacks of this feature.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)