cliffsun91 opened a new issue, #35527:
URL: https://github.com/apache/beam/issues/35527

   ### What would you like to happen?
   
   We're currently evaluating the usage of the Beam YAML API and running on 
Flink (in k8s). For the Python SDK harness we can specify 
--environment_type=EXTERNAL when submitting the pipeline (we run the Python SDK 
Harness as a side car container along side the Flink task manager). This all 
works fine if we submit a simple pipeline that doesn't use a transform that 
requires a Java provider (which I believe requires the expansion service). But 
as soon as we submit a pipeline that uses one (such as a `Join` or `Sql`) then 
it tries to start a docker container with the Beam Java SDK in our Flink task 
manager and there doesn't seem to be a way to tell it _not_ to do that (or at 
least if there is it's not documented anywhere). We don't want to resort to 
DinD or DooD due to the security implications of doing this within a kubernetes 
cluster in production and also the unnecessary complexity around this.
   
   There's an example here that shows if you're using the Beam Python SDK 
directly you can specify how to start the expansion service to avoid it trying 
to use docker: 
https://github.com/lydian/beam-python-flink-runner-examples/blob/master/docker/src/example.py#L47
   
   If there is a way to do this when submitting via the YAML API then it would 
be good to have it documented clearly (as I've not found anything after 
extensive searching), or if not then could we consider support this?
   
   ### Issue Priority
   
   Priority: 2 (default / most feature requests should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Infrastructure
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to