lostluck commented on a change in pull request #14822:
URL: https://github.com/apache/beam/pull/14822#discussion_r635277280



##########
File path: sdks/go/pkg/beam/core/runtime/pipelinex/replace.go
##########
@@ -207,11 +218,142 @@ func externalIns(counted map[string]bool, xforms 
map[string]*pipepb.PTransform,
        }
 }
 
-// ensureUniqueNames ensures that each name is unique. Any conflict is
-// resolved by adding '1, '2, etc to the name.
+type idSorted []string
+
+func (s idSorted) Len() int {
+       return len(s)
+}
+func (s idSorted) Swap(i, j int) {
+       s[i], s[j] = s[j], s[i]
+}
+
+var idParseExp = regexp.MustCompile(`(\D*)(\d*)`)

Review comment:
       This is a case of "good enough" vs "perfect". This is only intended to 
make go's natural ID assignment scheme (e# for Transforms, and s# for 
composites) sort properly. I'll add a comment to that effect though.
   
   Basically this fixes having weird lexicographic ordering where the sort is 
e1 e10 e11 e2 e20 e21 e3 e30.... rather than having the uniquifiers assigned to 
addition to the pipeline order.  It would manifest often with a CoGBK near the 
start of a pipline getting a uniquifier, rather than one towards the end, which 
is unintuitive. 
   
   The Go SDK assigns ids sequentially, and the global ids are invariably 
"<letter><number>" by default. The complications come in when we add 
intermediate nodes, like the CoGBK's inject and Expand steps which add suffixes 
(like _injectN and _expand). However, we don't care so much with those. Such 
SDK additions are always wrapped in a composite ptransform, so there's nothing 
at the same "level" for them to get uniquified against. They're unique by their 
ancestors.
   
   The other times PTransform ids won't match this pattern is from xlang, which 
will have whatever the foreign SDK uses. But that's similarly handled by the 
"use the lexicographic" ordering case, and being wrapped in a composite.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to