KevinGG commented on a change in pull request #11141: [BEAM-7923] Include side 
effects in p.run
URL: https://github.com/apache/beam/pull/11141#discussion_r393352768
 
 

 ##########
 File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
 ##########
 @@ -418,10 +420,16 @@ def visit_transform(self, transform_node):
                       tuple(ie.current_env().options.capturable_sources)):
           unbounded_source_pcolls.update(transform_node.outputs.values())
         cacheable_inputs.update(self._pin._cacheable_inputs(transform_node))
+        ins, outs = self._pin._all_inputs_outputs(transform_node)
+        all_inputs.update(ins)
+        all_outputs.update(outs)
 
     v = InstrumentVisitor(self)
     self._pipeline.visit(v)
 
+    # Every output PCollection that is never used as an input PCollection is
+    # considered as a side effect of the pipeline run and should be included.
+    self._extended_targets.update(all_outputs.difference(all_inputs))
 
 Review comment:
   It's not necessary. The intended behavior is not ambiguous: When the user 
uses `show`, `head`, `collect` APIs, these PCollections are excluded completely 
as the user explicitly wishes. And when the user invokes `p.run()`, all 
transforms in the pipeline should be executed as expected.
   
   This change is only to make sure that the prune logic doesn't affect the 
above intended behavior.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to