lostluck commented on issue #21234: URL: https://github.com/apache/beam/issues/21234#issuecomment-2590847576
This is overall described here: https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit?tab=t.0#heading=h.oes73844vmhl I'm going to sumarize what the Python component is actually doing, as it's quite hard to follow the Python through the abstraction layers. The protocol supports Javaless job submission to a Flink cluster, which avoids needing to run a JobServer manually in Production. This is already implemented in Python (as linked above), but it is also how the Java JobServer also submits jobs to Flink anyway. The top linked doc and the python should be followed for precision in what needs to be implemented. The following is to summarize the existing part for them to be implemented idiomatically in Go. * Have a flink-master available somewhere. * Receive pipeline through "normal" JobManagement protocols * This includes artifacts, dependancies, containers, and similar. * Requests the Flink version from the flink-master over REST. * https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L83 * Downloading the SDK & Flink version matched Flink JobServer jar from Maven (if not already available locally, or overridden) * https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L78 * https://github.com/apache/beam/blob/master/sdks/python/apache_beam/utils/subprocess_server.py#L333 * Make a copy of the Jar, and add in the pipeline proto, and artifacts to the JAR. * https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/abstract_job_service.py#L315 * Submit that jar over REST to the Flink master. * https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L152 * Engage with Flink directly for the life of the job. * https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L107 ## Nice to Have Goals * Be sufficiently abstract to also support the same approaches for Spark without too much overhead. * Ensure this flow can be easily hosted in the stand alone Prism binary. This avoids needing any specific runtime, other than the existing flow to download the platform specific prism binary for other/new SDKs, and they can then automatically gain this support, on top of a robust local development experience with Prism. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
