[
https://issues.apache.org/jira/browse/BEAM-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jack McCluskey updated BEAM-13628:
----------------------------------
Resolution: Fixed
Status: Resolved (was: Open)
> [Go SDK] Make Side input cache fit resolved semantics.
> ------------------------------------------------------
>
> Key: BEAM-13628
> URL: https://issues.apache.org/jira/browse/BEAM-13628
> Project: Beam
> Issue Type: Bug
> Components: sdk-go
> Affects Versions: 2.35.0
> Reporter: Robert Burke
> Assignee: Jack McCluskey
> Priority: P2
> Fix For: 2.36.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> It's been determined the documentation in the proto was a a bit buggy WRT
> Side input semantics. Previous to https://github.com/apache/beam/pull/16474
> it said state cache tokens are globally unique, however, in implementation
> and the original design they are unique WRT their associated StateKeys.
> This means the Go SDK's side input cache is broken as delivered, and can
> cause a correctness issue when there are multiple distinct side inputs, of
> the same type. The mitigation is to not use the SideInput cache in affected
> versions (2.35.0). The cache is off by default.
> The correction will use the whole state key (which, for side inputs includes
> the transformID ,SideInputID) tuple (with a user key if it's a multimap side
> input)), along with the Runner provided token.
> Since this can at worst cause a data correctness issue rather than a pipeline
> failure, this should be part of the 2.36.0 release. We may wish to backport
> it to a 2.35.1 patch release, only for the Go SDK to close the gap as well.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)