[
https://issues.apache.org/jira/browse/BEAM-10583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165944#comment-17165944
]
Kyle Weaver commented on BEAM-10583:
------------------------------------
Note that [2] is misleading on this topic:
"Please note that if you have written your Beam pipeline in _python_ the
procedure to make it work on Databricks should look more or less the same"
This is greatly oversimplifying the issue. Non-portable Beam (Java) can usually
run on any Spark cluster "out of the box"; portable Beam (Python) cannot.
> Document Beam Python on Databricks
> ----------------------------------
>
> Key: BEAM-10583
> URL: https://issues.apache.org/jira/browse/BEAM-10583
> Project: Beam
> Issue Type: Wish
> Components: runner-spark
> Reporter: Kyle Weaver
> Priority: P2
> Labels: portability-spark
>
> There are folks out there trying to run Beam Python on Databricks [1]. While
> there is documentation out there for the Java SDK [2], Python is more
> involved because the user needs to install the SDK harness.
> [1] [https://github.com/tensorflow/tfx/issues/2220]
> [2]
> [https://towardsdatascience.com/running-an-apache-beam-data-pipeline-on-azure-databricks-c09e521d8fc3]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)