KevinGG commented on a change in pull request #11141: [BEAM-7923] Include side
effects in p.run
URL: https://github.com/apache/beam/pull/11141#discussion_r393352768
##########
File path: sdks/python/apache_beam/runners/interactive/pipeline_instrument.py
##########
@@ -418,10 +420,16 @@ def visit_transform(self, transform_node):
tuple(ie.current_env().options.capturable_sources)):
unbounded_source_pcolls.update(transform_node.outputs.values())
cacheable_inputs.update(self._pin._cacheable_inputs(transform_node))
+ ins, outs = self._pin._all_inputs_outputs(transform_node)
+ all_inputs.update(ins)
+ all_outputs.update(outs)
v = InstrumentVisitor(self)
self._pipeline.visit(v)
+ # Every output PCollection that is never used as an input PCollection is
+ # considered as a side effect of the pipeline run and should be included.
+ self._extended_targets.update(all_outputs.difference(all_inputs))
Review comment:
It's not necessary. The intended behavior is not ambiguous: When the user
uses `show`, `head`, `collect` APIs, these PCollections are excluded completely
as the user explicitly wishes. And when the user invokes `p.run()`, all
transforms in the pipeline should be executed as expected.
This change is only to make sure that the prune logic doesn't affect the
above intended behavior.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services