[ 
https://issues.apache.org/jira/browse/BEAM-10305?focusedWorklogId=450565&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450565
 ]

ASF GitHub Bot logged work on BEAM-10305:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Jun/20 18:17
            Start Date: 24/Jun/20 18:17
    Worklog Time Spent: 10m 
      Work Description: lukecwik commented on a change in pull request #12062:
URL: https://github.com/apache/beam/pull/12062#discussion_r445083333



##########
File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/InMemoryBagUserStateFactory.java
##########
@@ -37,13 +37,19 @@
 /**
  * Holds user state in memory. Only one key is active at a time due to the 
GroupReduceFunction being
  * called once per key. Needs to be reset via {@code resetForNewKey()} before 
processing a new key.
+ *
+ * <p>In case of any failures, this factory must be discarded. Otherwise, the 
contained state cache
+ * token would be reused which would corrupt the state cache.
  */
 public class InMemoryBagUserStateFactory<K, V, W extends BoundedWindow>
     implements StateRequestHandlers.BagUserStateHandlerFactory<K, V, W> {
 
+  private final ByteString cacheToken;
+
   private List<InMemorySingleKeyBagState> handlers;
 
   public InMemoryBagUserStateFactory() {
+    cacheToken = 
ByteString.copyFrom(UUID.randomUUID().toString().getBytes(Charsets.UTF_8));

Review comment:
       
`ByteStringStateRequestHandlerToBagUserStateHandlerFactoryAdapter#getCacheTokens`
 is going to add N copies of the cache token to the `ProcessBundleRequest`
   
   For user state, it makes sense to have a caching handler that does the 
delegation to other handlers. This caching handler is the one that should be 
responsible for supplying the single cache token.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 450565)
    Time Spent: 2h  (was: 1h 50m)

> InMemoryBagUserStateFactory creates a cache token per state cell
> ----------------------------------------------------------------
>
>                 Key: BEAM-10305
>                 URL: https://issues.apache.org/jira/browse/BEAM-10305
>             Project: Beam
>          Issue Type: Bug
>          Components: java-fn-execution, runner-flink, sdk-py-harness
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>            Priority: P3
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> When the state cache is enabled in the Python SDK, the batch mode of the 
> Flink Runner currently only allows a single user state cell because a new 
> cache token is generated for each state cell; the caching code in the Python 
> SDK Harness only supports one cache token per user state handler. 
> Theoretically multiple cache tokens would work but would just be adding to 
> the payload. We should make sure to just send a single cache token in batch 
> mode (which is already the case in streaming)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to