[
https://issues.apache.org/jira/browse/BEAM-11355?focusedWorklogId=518065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-518065
]
ASF GitHub Bot logged work on BEAM-11355:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 30/Nov/20 18:15
Start Date: 30/Nov/20 18:15
Worklog Time Spent: 10m
Work Description: robertwb commented on a change in pull request #13432:
URL: https://github.com/apache/beam/pull/13432#discussion_r532798665
##########
File path:
sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py
##########
@@ -557,8 +557,13 @@ def add_parent(child, parent):
pipeline_proto.components.transforms[parent])
copy_output_pcollections(components.transforms[parent])
del components.transforms[parent].subtransforms[:]
- add_parent(parent, parents.get(parent))
+ # Ensure that child is the lsat item in the parent's subtransforms.
Review comment:
s/lsat/last/
##########
File path:
sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py
##########
@@ -557,8 +557,13 @@ def add_parent(child, parent):
pipeline_proto.components.transforms[parent])
copy_output_pcollections(components.transforms[parent])
del components.transforms[parent].subtransforms[:]
- add_parent(parent, parents.get(parent))
+ # Ensure that child is the lsat item in the parent's subtransforms.
+ # This is required to maintain topological order with sort_stages.
Review comment:
Isn't sort_stages called before this? Or did you mean this is required
to maintain the topolicial order from sort_stages?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 518065)
Time Spent: 0.5h (was: 20m)
> pipeline_from_stages after sort_stages can return non-topologically ordered
> pipelines
> -------------------------------------------------------------------------------------
>
> Key: BEAM-11355
> URL: https://issues.apache.org/jira/browse/BEAM-11355
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Yifan Mai
> Assignee: Yifan Mai
> Priority: P2
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> translations.sort_stages() sorts stages in topological order. However,
> calling translations.pipeline_from_stages() on sorted stages can result in a
> pipeline that is not topologically ordered. This is because of how it
> constructs the tree of parent to subtransforms.
> Example pipeline:
> * Leaf transforms are A, B and C.
> * Composite transform D has subtransforms B and C.
> * Root transform E has subtransforms A and D.
> * A produces an output that is an input to C, and B produces an output that
> is an input to C.
> * After optimizations and sort stages, the order of leaf stages is B, A, C
> (this is a valid ordering)
> Under the current implementation of translations.pipeline_from_stages():
> # B is added to the pipeline first, which also adds its parent D and its
> grandparent E to the pipeline. D is added as the first subtransform of E and
> B is added as the first subtransform of D.
> # A is added to the pipeline second. A is added as the second subtransform
> of E.
> # C is added to the pipeline third. C is added as the second subtransform of
> D.
> The order is now E(D(B, C), A) which is invalid because C must follow A. A
> valid order would be E(A, D(B, C)).
> The easiest fix is to change translations.pipeline_from_stages() such that
> whenever a leaf transform is added to the pipeline, all its ancestors are
> moved to the last position of the subtransforms of their respective parent.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)