[GitHub] [beam] damccorm opened a new issue, #20534: Simplify use of the Python Portable runner for Go SDK pipelines

GitBox Sat, 04 Jun 2022 11:07:02 -0700


damccorm opened a new issue, #20534:
URL: https://github.com/apache/beam/issues/20534

It's possible to execute Go SDK pipelines on any portable Beam runner, using
the "universal" runner and specifying the endpoint of the job server. However,
this is inconvenient in some instances as it requires having a standing Job
Management server for the runner in question.

This task is to simplify using the Python Portable Runner for
arbitrary/novice Go SDK users. While for performance, its generally better to
keep a job management server around so it can execute multiple jobs, this isn't
required.

The goal would be to create a "python" runner for the Go SDK, which will
start up the python portable runner job server, and submit a pipeline to it in
Loopback mode for execution, using the "universal runner", and wait for the job
to finish.

This will give Go users access to a correct runner for testing, and allow
them to develop their pipelines confidently before moving them to distributed
runners like Flink, Spark, or Dataflow.

Ideally outside of some clearly indicated dependencies (and failures when
they aren't present), a user should be able to import the package and specify
\--runner=python, and have their pipeline execute.

The "long way" for using the Python Portable Runner with the Go SDK is on
the [Go Tips page of the Dev wiki.
](https://cwiki.apache.org/confluence/display/BEAM/Go+Tips)
The Go side runner code is in
[https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/runners](https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/runners)

The Python Portable runner entry point is here:
[https://github.com/apache/beam/blob/3d296c42f9d9dbb7c2234dec325f6a5255b821ee/sdks/python/apache_beam/runners/portability/portable_runner.py](https://github.com/apache/beam/blob/3d296c42f9d9dbb7c2234dec325f6a5255b821ee/sdks/python/apache_beam/runners/portability/portable_runner.py)

The simplest way for this would probably be to require users have Docker
installed, and for the Beam project to publish a Docker Container image that
can start up the Python Runner job server appropriately. This keeps the
dependencies minimal, and start up consistent for users, and we likely can
re-use the technique for other purposes. And using a similar technique would
make developing new SDKs easier as well, as new SDKs can use the same
infrastructure from the start.

Other approaches to solve the problem are of course welcome.

Imported from Jira
[BEAM-11077](https://issues.apache.org/jira/browse/BEAM-11077). Original Jira
may contain additional context.
Reported by: lostluck.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] damccorm opened a new issue, #20534: Simplify use of the Python Portable runner for Go SDK pipelines

Reply via email to