youngoli commented on a change in pull request #14822:
URL: https://github.com/apache/beam/pull/14822#discussion_r634815434



##########
File path: sdks/go/pkg/beam/core/runtime/pipelinex/replace.go
##########
@@ -207,11 +218,142 @@ func externalIns(counted map[string]bool, xforms 
map[string]*pipepb.PTransform,
        }
 }
 
-// ensureUniqueNames ensures that each name is unique. Any conflict is
-// resolved by adding '1, '2, etc to the name.
+type idSorted []string
+
+func (s idSorted) Len() int {
+       return len(s)
+}
+func (s idSorted) Swap(i, j int) {
+       s[i], s[j] = s[j], s[i]
+}
+
+var idParseExp = regexp.MustCompile(`(\D*)(\d*)`)

Review comment:
       This may have issues if there are IDs with numbers in the middle instead 
of just at the end. Does the Go SDK guarantee that IDs won't have numbers in 
the middle?

##########
File path: sdks/go/pkg/beam/runners/dataflow/dataflow.go
##########
@@ -81,6 +81,7 @@ func init() {
        beam.RegisterRunner("DataflowRunner", Execute)
 
        perf.RegisterProfCaptureHook("gcs_profile_writer", gcsRecorderHook)
+       pipelinex.IdempotentNormalize = true

Review comment:
       I think this is flipping this flag for multiple runners if the beamx 
package is used, since it imports multiple runners. Although I don't think 
that's actually an issue, but if we're going to activate it for all runners, 
let's do it explicitly by setting this to true by default in the original 
declaration.
   
   Or if we only want Dataflow using this, let's put this line somewhere in the 
Dataflow runner's execution path so it only gets called if the Dataflow runner 
is being used. 

##########
File path: sdks/go/pkg/beam/core/runtime/pipelinex/replace.go
##########
@@ -41,6 +44,10 @@ func Update(p *pipepb.Pipeline, values *pipepb.Components) 
(*pipepb.Pipeline, er
        return Normalize(ret)
 }
 
+// IdempotentNormalize determines whether to use the idempotent version
+// of ensureUniqueNames or the legacy version.
+var IdempotentNormalize bool

Review comment:
       If this boolean is going to be cleaned up later, might be worth adding a 
TODO here with a corresponding Jira.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to