[
https://issues.apache.org/jira/browse/BEAM-11928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Beam JIRA Bot reassigned BEAM-11928:
------------------------------------
Assignee: (was: Jack McCluskey)
> Go SDK should use the combine_globally urn for global combines.
> ---------------------------------------------------------------
>
> Key: BEAM-11928
> URL: https://issues.apache.org/jira/browse/BEAM-11928
> Project: Beam
> Issue Type: Improvement
> Components: sdk-go
> Reporter: Robert Burke
> Priority: P3
> Labels: stale-assigned
> Time Spent: 5h 40m
> Remaining Estimate: 0h
>
> Reported on
> [https://stackoverflow.com/questions/66446338/issue-with-combine-function-in-apache-beam-go-sdk/66486052#66486052]
> The root is that the Go SDK doesn't use the
> "beam:transform:combine_globally:v1" URN, and always uses
> "beam:transform:combine_per_key:v1" even for global combines, with a
> AddFixedKey DoFn.
> URN in the proto:
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L347]
>
> Go SDK only having combine_per_key
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L42]
> We currently "detect" combines via a CombinePerKey scope
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/edge.go#L434]
>
> added at beam.TryCombinePerKey
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/combine.go#L58]
> We convert combines into the CombinePayload here
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L253]
> called above here:
> [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L241]
>
> We probably want to just add a graph.CombineGlobal op ( vs the existing
> combine node), or modify the "CombinePerKey" scope hack to have a
> CombineCombineGlobal variant, or somehting that is cleaner than currently
> exists.
> We'd also want to make sure the optimization takes place properly, which
> should be simple enough to detect timing wise at least once, if not as a
> regular benchmark.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)