[
https://issues.apache.org/jira/browse/BEAM-14128?focusedWorklogId=744270&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-744270
]
ASF GitHub Bot logged work on BEAM-14128:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 18/Mar/22 17:37
Start Date: 18/Mar/22 17:37
Worklog Time Spent: 10m
Work Description: lostluck commented on a change in pull request #17120:
URL: https://github.com/apache/beam/pull/17120#discussion_r830218179
##########
File path: sdks/go/pkg/beam/core/runtime/pipelinex/replace_test.go
##########
@@ -310,6 +311,28 @@ func TestComputeInputOutput(t *testing.T) {
}
}
+func BenchmarkComputeInputOutput(b *testing.B) {
+ in := make(map[string]*pipepb.PTransform)
+ for i := 0; i < 3000; i++ {
Review comment:
Can we add a comment about what this graph looks like? I think it's just
a linked list of nested primitives and composites, but I'd like a quick
confirmation.
##########
File path: sdks/go/pkg/beam/core/runtime/pipelinex/replace.go
##########
@@ -118,16 +118,25 @@ func makeParentMap(xforms map[string]*pipepb.PTransform)
map[string]string {
func computeCompositeInputOutput(xforms map[string]*pipepb.PTransform)
map[string]*pipepb.PTransform {
ret := reflectx.ShallowClone(xforms).(map[string]*pipepb.PTransform)
Review comment:
```suggestion
// Precompute the transforms that consume each PCollection as input.
```
##########
File path: sdks/go/pkg/beam/core/runtime/pipelinex/replace.go
##########
@@ -201,19 +210,11 @@ func diffAndMerge(out, in, extIn map[string]bool)
map[string]string {
}
// externalIns checks the unseen non-composite graph
-func externalIns(counted map[string]bool, xforms
map[string]*pipepb.PTransform, extIn, out map[string]bool) {
- for id, pt := range xforms {
- // Ignore other composites or already counted transforms.
- if counted[id] || len(pt.GetSubtransforms()) != 0 {
- continue
- }
- // Check this PTransform's inputs for anything output by
something
- // the current composite.
- for col := range out {
- for _, incol := range pt.GetInputs() {
- if col == incol {
- extIn[col] = true
- }
+func externalIns(counted map[string]bool, primitiveXformsForInput
map[string][]string, extIn, out map[string]bool) {
+ for col := range out {
+ for _, id := range primitiveXformsForInput[col] {
+ if !counted[id] {
+ extIn[col] = true
Review comment:
```suggestion
func externalIns(counted map[string]bool, primitiveXformsForInput
map[string][]string, extIn, out map[string]bool) {
// For this composite's output PCollections.
for col := range out {
// See if any transforms are using it.
for _, id := range primitiveXformsForInput[col] {
if !counted[id] {
// And if they're part of the composite
// Ensure the collections is in the set of
outputs for the composite.
extIn[col] = true
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 744270)
Time Spent: 0.5h (was: 20m)
> Quadratic behavior in go pipeline rewriting.
> --------------------------------------------
>
> Key: BEAM-14128
> URL: https://issues.apache.org/jira/browse/BEAM-14128
> Project: Beam
> Issue Type: Bug
> Components: sdk-go
> Affects Versions: 2.37.0
> Reporter: Robert Burke
> Assignee: Robert Burke
> Priority: P2
> Fix For: 2.38.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Work is repeated for every node outside of a given composite, when checking
> if anything is using any outputs of transforms internal to the composite.
> This bug significantly increases the construction time large pipeline graphs,
> but can be resolved with bit of pre-computation and plumbing of which
> transforms are using which PCollections.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)