Re: [I] Implement Flink "uber jar" job server in Go SDK. [beam]


lostluck commented on issue #21234:
URL: https://github.com/apache/beam/issues/21234#issuecomment-2590847576

This is overall described here:
https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit?tab=t.0#heading=h.oes73844vmhl

I'm going to sumarize what the Python component is actually doing, as it's
quite hard to follow the Python through the abstraction layers.

The protocol supports Javaless job submission to a Flink cluster, which
avoids needing to run a JobServer manually in Production. This is already
implemented in Python (as linked above), but it is also how the Java JobServer
also submits jobs to Flink anyway.

The top linked doc and the python should be followed for precision in what
needs to be implemented. The following is to summarize the existing part for
them to be implemented idiomatically in Go.

* Have a flink-master available somewhere.
* Receive pipeline through "normal" JobManagement protocols
* This includes artifacts, dependancies, containers, and similar.
* Requests the Flink version from the flink-master over REST.
*
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L83

* Downloading the SDK & Flink version matched Flink JobServer jar from Maven
(if not already available locally, or overridden)
*
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L78

*
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/subprocess_server.py#L333
* Make a copy of the Jar, and add in the pipeline proto, and artifacts to
the JAR.
*
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/abstract_job_service.py#L315

* Submit that jar over REST to the Flink master.
*
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L152

* Engage with Flink directly for the life of the job.
*
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L107

## Nice to Have Goals
* Be sufficiently abstract to also support the same approaches for Spark
without too much overhead.
* Ensure this flow can be easily hosted in the stand alone Prism binary.
This avoids needing any specific runtime, other than the existing flow to
download the platform specific prism binary for other/new SDKs, and they can
then automatically gain this support, on top of a robust local development
experience with Prism.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to