Sam Whittle created BEAM-7547:
---------------------------------

             Summary: StreamingDataflowWorker can observe inconsistent cache 
for stale work items
                 Key: BEAM-7547
                 URL: https://issues.apache.org/jira/browse/BEAM-7547
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
            Reporter: Sam Whittle
            Assignee: Sam Whittle


1. Dataflow backend generates a work item with a cache token C.
2. StreamingDataflowWorker receives the work item and reads the state using C, 
it either hits the cache or performs a read.
3. Dataflow backend sends a retry of the work item (possibly because it thinks 
original work item never reached the StreamingDataflowWorker).
4. StreamingDataflowWorker commits the work item and gets ack from dataflow 
backend.  It caches the state for the key using C.
5. StreamingDataflowWorker receives the retried work item with cache token C.  
It uses the cached state and causes possible user consistency failures because 
the cache view is of after the work item completed processing.

Note that this will not cause corrupted Dataflow persistent state because the 
commit of the retried work item using the inconsistent cache will fail. However 
it may cause failures in user logic for example if they keep the set of all 
seen items in state and throw an exception on duplicates which should have been 
removed by an upstream stage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to