[
https://issues.apache.org/jira/browse/BEAM-10305?focusedWorklogId=450601&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-450601
]
ASF GitHub Bot logged work on BEAM-10305:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 24/Jun/20 19:48
Start Date: 24/Jun/20 19:48
Worklog Time Spent: 10m
Work Description: mxm commented on a change in pull request #12062:
URL: https://github.com/apache/beam/pull/12062#discussion_r445131904
##########
File path:
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/state/InMemoryBagUserStateFactory.java
##########
@@ -37,13 +37,19 @@
/**
* Holds user state in memory. Only one key is active at a time due to the
GroupReduceFunction being
* called once per key. Needs to be reset via {@code resetForNewKey()} before
processing a new key.
+ *
+ * <p>In case of any failures, this factory must be discarded. Otherwise, the
contained state cache
+ * token would be reused which would corrupt the state cache.
*/
public class InMemoryBagUserStateFactory<K, V, W extends BoundedWindow>
implements StateRequestHandlers.BagUserStateHandlerFactory<K, V, W> {
+ private final ByteString cacheToken;
+
private List<InMemorySingleKeyBagState> handlers;
public InMemoryBagUserStateFactory() {
+ cacheToken =
ByteString.copyFrom(UUID.randomUUID().toString().getBytes(Charsets.UTF_8));
Review comment:
I see, makes sense. Let me update the PR.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 450601)
Time Spent: 2h 10m (was: 2h)
> InMemoryBagUserStateFactory creates a cache token per state cell
> ----------------------------------------------------------------
>
> Key: BEAM-10305
> URL: https://issues.apache.org/jira/browse/BEAM-10305
> Project: Beam
> Issue Type: Bug
> Components: java-fn-execution, runner-flink, sdk-py-harness
> Reporter: Maximilian Michels
> Assignee: Maximilian Michels
> Priority: P3
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> When the state cache is enabled in the Python SDK, the batch mode of the
> Flink Runner currently only allows a single user state cell because a new
> cache token is generated for each state cell; the caching code in the Python
> SDK Harness only supports one cache token per user state handler.
> Theoretically multiple cache tokens would work but would just be adding to
> the payload. We should make sure to just send a single cache token in batch
> mode (which is already the case in streaming)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)