[ 
https://issues.apache.org/jira/browse/BEAM-13628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack McCluskey updated BEAM-13628:
----------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Open)

> [Go SDK] Make Side input cache fit resolved semantics.
> ------------------------------------------------------
>
>                 Key: BEAM-13628
>                 URL: https://issues.apache.org/jira/browse/BEAM-13628
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-go
>    Affects Versions: 2.35.0
>            Reporter: Robert Burke
>            Assignee: Jack McCluskey
>            Priority: P2
>             Fix For: 2.36.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> It's been determined the documentation in the proto was a a bit buggy WRT 
> Side input semantics. Previous to https://github.com/apache/beam/pull/16474 
> it said state cache tokens are globally unique, however, in implementation 
> and the original design they are unique WRT their associated StateKeys.
> This means the Go SDK's side input cache is broken as delivered, and can 
> cause a correctness issue when there are multiple distinct side inputs, of 
> the same type. The mitigation is to not use the SideInput cache in affected 
> versions (2.35.0). The cache is off by default.
> The correction will use the whole state key (which, for side inputs includes 
> the transformID ,SideInputID) tuple (with a user key if it's a multimap side 
> input)), along with the Runner provided token.
> Since this can at worst cause a data correctness issue rather than a pipeline 
> failure, this should be part of the 2.36.0 release. We may wish to backport 
> it to a 2.35.1 patch release, only for the Go SDK to close the gap as well.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to