[ 
https://issues.apache.org/jira/browse/BEAM-12341?focusedWorklogId=598992&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-598992
 ]

ASF GitHub Bot logged work on BEAM-12341:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/May/21 03:44
            Start Date: 19/May/21 03:44
    Worklog Time Spent: 10m 
      Work Description: youngoli commented on a change in pull request #14822:
URL: https://github.com/apache/beam/pull/14822#discussion_r634815434



##########
File path: sdks/go/pkg/beam/core/runtime/pipelinex/replace.go
##########
@@ -207,11 +218,142 @@ func externalIns(counted map[string]bool, xforms 
map[string]*pipepb.PTransform,
        }
 }
 
-// ensureUniqueNames ensures that each name is unique. Any conflict is
-// resolved by adding '1, '2, etc to the name.
+type idSorted []string
+
+func (s idSorted) Len() int {
+       return len(s)
+}
+func (s idSorted) Swap(i, j int) {
+       s[i], s[j] = s[j], s[i]
+}
+
+var idParseExp = regexp.MustCompile(`(\D*)(\d*)`)

Review comment:
       This may have issues if there are IDs with numbers in the middle instead 
of just at the end. Does the Go SDK guarantee that IDs won't have numbers in 
the middle?

##########
File path: sdks/go/pkg/beam/runners/dataflow/dataflow.go
##########
@@ -81,6 +81,7 @@ func init() {
        beam.RegisterRunner("DataflowRunner", Execute)
 
        perf.RegisterProfCaptureHook("gcs_profile_writer", gcsRecorderHook)
+       pipelinex.IdempotentNormalize = true

Review comment:
       I think this is flipping this flag for multiple runners if the beamx 
package is used, since it imports multiple runners. Although I don't think 
that's actually an issue, but if we're going to activate it for all runners, 
let's do it explicitly by setting this to true by default in the original 
declaration.
   
   Or if we only want Dataflow using this, let's put this line somewhere in the 
Dataflow runner's execution path so it only gets called if the Dataflow runner 
is being used. 

##########
File path: sdks/go/pkg/beam/core/runtime/pipelinex/replace.go
##########
@@ -41,6 +44,10 @@ func Update(p *pipepb.Pipeline, values *pipepb.Components) 
(*pipepb.Pipeline, er
        return Normalize(ret)
 }
 
+// IdempotentNormalize determines whether to use the idempotent version
+// of ensureUniqueNames or the legacy version.
+var IdempotentNormalize bool

Review comment:
       If this boolean is going to be cleaned up later, might be worth adding a 
TODO here with a corresponding Jira.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 598992)
    Time Spent: 40m  (was: 0.5h)

> Use portable job submission for Go jobs to dataflow.
> ----------------------------------------------------
>
>                 Key: BEAM-12341
>                 URL: https://issues.apache.org/jira/browse/BEAM-12341
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow, sdk-go
>            Reporter: Robert Burke
>            Assignee: Robert Burke
>            Priority: P2
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> # Need to  set the "use_portable_job_submission" when sending jobs to 
> Dataflow.
>  # The Go SDK's portable format doesn't conform to how other SDKs generate 
> their graph, so simply switching the feature on will cause composite scopes 
> to not be represented in the runner's job view.  Since the Go SDK graph runs 
> through a separate sort and uniquifying process, it can't have these names 
> done at construction time, but after converting the graph from SDK to proto. 
> That represents a clear place to update all the names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to