[ 
https://issues.apache.org/jira/browse/BEAM-7547?focusedWorklogId=264164&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-264164
 ]

ASF GitHub Bot logged work on BEAM-7547:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Jun/19 22:03
            Start Date: 20/Jun/19 22:03
    Worklog Time Spent: 10m 
      Work Description: scwhittle commented on issue #8842: [BEAM-7547] Avoid 
WindmillStateCache cache hits for stale work.
URL: https://github.com/apache/beam/pull/8842#issuecomment-504216208
 
 
   The error is:
   
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Java_Commit/src/runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/WindmillStateCacheTest.java:320:
 error: method forKey in class WindmillStateCache.ForComputation cannot be 
applied to given types;
   13:05:19         
cache.forComputation("comp1").forKey(ByteString.copyFromUtf8("key1"), 
STATE_FAMILY, 0L);
   13:05:19                                      ^
   13:05:19   required: ByteString,String,long,long
   13:05:19   found: ByteString,String,long
   13:05:19   reason: actual and formal argument lists differ in length
   
   I don't get the error locally, the referenced line doesn't exist in the 
changed file and all uses of forKey in the latest commit have 4 parameters as 
expected.  Does the presubmit somehow cache build/test results?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 264164)
    Time Spent: 1h 40m  (was: 1.5h)

> StreamingDataflowWorker can observe inconsistent cache for stale work items
> ---------------------------------------------------------------------------
>
>                 Key: BEAM-7547
>                 URL: https://issues.apache.org/jira/browse/BEAM-7547
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: Minor
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> 1. Dataflow backend generates a work item with a cache token C.
> 2. StreamingDataflowWorker receives the work item and reads the state using 
> C, it either hits the cache or performs a read.
> 3. Dataflow backend sends a retry of the work item (possibly because it 
> thinks original work item never reached the StreamingDataflowWorker).
> 4. StreamingDataflowWorker commits the work item and gets ack from dataflow 
> backend.  It caches the state for the key using C.
> 5. StreamingDataflowWorker receives the retried work item with cache token C. 
>  It uses the cached state and causes possible user consistency failures 
> because the cache view is of after the work item completed processing.
> Note that this will not cause corrupted Dataflow persistent state because the 
> commit of the retried work item using the inconsistent cache will fail. 
> However it may cause failures in user logic for example if they keep the set 
> of all seen items in state and throw an exception on duplicates which should 
> have been removed by an upstream stage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to