[
https://issues.apache.org/jira/browse/BEAM-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ismaël Mejía updated BEAM-7547:
-------------------------------
Status: Open (was: Triage Needed)
> StreamingDataflowWorker can observe inconsistent cache for stale work items
> ---------------------------------------------------------------------------
>
> Key: BEAM-7547
> URL: https://issues.apache.org/jira/browse/BEAM-7547
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow
> Reporter: Sam Whittle
> Assignee: Sam Whittle
> Priority: Minor
> Time Spent: 40m
> Remaining Estimate: 0h
>
> 1. Dataflow backend generates a work item with a cache token C.
> 2. StreamingDataflowWorker receives the work item and reads the state using
> C, it either hits the cache or performs a read.
> 3. Dataflow backend sends a retry of the work item (possibly because it
> thinks original work item never reached the StreamingDataflowWorker).
> 4. StreamingDataflowWorker commits the work item and gets ack from dataflow
> backend. It caches the state for the key using C.
> 5. StreamingDataflowWorker receives the retried work item with cache token C.
> It uses the cached state and causes possible user consistency failures
> because the cache view is of after the work item completed processing.
> Note that this will not cause corrupted Dataflow persistent state because the
> commit of the retried work item using the inconsistent cache will fail.
> However it may cause failures in user logic for example if they keep the set
> of all seen items in state and throw an exception on duplicates which should
> have been removed by an upstream stage.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)