[ 
https://issues.apache.org/jira/browse/BEAM-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Whittle reassigned BEAM-11216:
----------------------------------

    Assignee: Sam Whittle

> StreamingDataflowWorker ReaderCache usage can be incorrect in presence of 
> retries
> ---------------------------------------------------------------------------------
>
>                 Key: BEAM-11216
>                 URL: https://issues.apache.org/jira/browse/BEAM-11216
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: P2
>
> This is similar to BEAM-7547.  The issue and identified there was related to 
> the state cache.  However for UnboundedSource there is a separate 
> ReaderCache.  In particular the following sequence could lead to using a 
> Reader at the wrong position.
> 1. Work arrives indicating to use reader at state C1
> 2. Work processes and advances reader to C2, commit is prepared with elements 
> output from C1 to C2 and reader checkpoint for C2
> 3. Commit of original processing fails (perhaps routed to previous backend 
> during autoscaling)
> 4. Work is retried (still indicating to use reader at C1 and still same cache 
> token) however the ReaderCache is used, there is a hit, and the existing 
> reader (positioned at C2) is used.
> 5. Retry of work processes and advances reader to C3, commit is scheduled 
> with elements output from C2 to C3 and reader checkpoint for C3
> 6. Commit succeeds
> At this point there was never a successful commit for the elements between C1 
> to C2, though the reader is now advanced past them.
> Possible fixes:
> 1. Use increasing work token as in BEAM-7547 to detect retry and not use the 
> ReaderCache entry
> 2. Seek recovered reader from cache to checkpoint or only use if it matches 
> the checkpoint. This would probably involve changing the Checkpoint interface 
> so 1 is likely preferred.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to