ihji commented on a change in pull request #13283:
URL: https://github.com/apache/beam/pull/13283#discussion_r521580860



##########
File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
##########
@@ -542,19 +593,32 @@ def run_pipeline(self, pipeline, options):
       # TODO(chamikara): remove following pipeline and pipeline proto 
recreation
       # after portable job submission path is fully in place.
       from apache_beam import Pipeline
-      pipeline = Pipeline.from_runner_api(
+      pipeline, src_context = Pipeline.from_runner_api(
           self.proto_pipeline,
           pipeline.runner,
           options,
+          return_context=True,
           allow_proto_holders=True)
 
       # Pipelines generated from proto do not have output set to PDone set for
       # leaf elements.
       pipeline.visit(self._set_pdone_visitor(pipeline))
 
+      from apache_beam.runners import pipeline_context
+      dst_context = pipeline_context.PipelineContext(
+          component_id_map=pipeline.component_id_map,
+          default_environment=self._default_environment)
+
+      # Copy external environments to prevent dangling environment ids
+      pipeline.visit(

Review comment:
       Python pipeline using python external transform doesn't work without 
handling dangling environment ids.
   
   It's not because of duplicated environments. It's because of missing 
environments. Dangling environment IDs are generated when 1) PTransform URN 
from external environment is known (therefore `RunnerAPIPTransformHolder` won't 
work as expected) 2) PipelineContext from rehydration process is not used in 
the second proto conversion. Python external transforms in Python pipelines are 
rehydrated to `AppliedPTransform` since external PTransform URN is known to the 
SDK however AppliedPTransform only keeps an environment id not environment 
itself. Environments are saved separately in PipelineContext but we dropped the 
PipelineContext from rehydration in the second proto conversion.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to