chamikaramj commented on a change in pull request #14189:
URL: https://github.com/apache/beam/pull/14189#discussion_r599687746
##########
File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
##########
@@ -740,6 +719,25 @@ def _apply_sdk_environment_overrides(
new_payload.container_image = new_container_image
environment.payload = new_payload.SerializeToString()
+ # De-dup environments by Docker container image since currently Dataflow
Review comment:
To clarify, the restriction is already there for Dataflow. We currently
start an SDK Harness per container image an de-dup here:
https://github.com/apache/beam/blob/83bd5485047373ae0e380c54063e3769874a8b09/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L304
This just moves the de-duping from container images started to Dataflow to
environments in the proto since I'm trying to update Dataflow to map work items
to environments based on the environment ID (not container image).
I can try to reduce de-duping to multi-language Java environments in
muti-language pipelines since multiple Python environments do not seem to be
running into issues currently. Multiple Java environments in multi-language
pipelines run into dependency conflicts. Does that help ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]