[
https://issues.apache.org/jira/browse/BEAM-2918?focusedWorklogId=158071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-158071
]
ASF GitHub Bot logged work on BEAM-2918:
----------------------------------------
Author: ASF GitHub Bot
Created on: 24/Oct/18 10:18
Start Date: 24/Oct/18 10:18
Worklog Time Spent: 10m
Work Description: mxm commented on a change in pull request #6726:
[BEAM-2918] Add state support for streaming in portable FlinkRunner
URL: https://github.com/apache/beam/pull/6726#discussion_r227728565
##########
File path:
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
##########
@@ -135,31 +149,131 @@ public void open() throws Exception {
// ownership of the higher level "factories" explicit? Do we care?
stageContext = contextFactory.get(jobInfo);
- stateRequestHandler = getStateRequestHandler(executableStage);
stageBundleFactory = stageContext.getStageBundleFactory(executableStage);
+ stateRequestHandler = getStateRequestHandler(executableStage);
progressHandler = BundleProgressHandler.unsupported();
outputQueue = new LinkedBlockingQueue<>();
}
private StateRequestHandler getStateRequestHandler(ExecutableStage
executableStage) {
+ final StateRequestHandler sideInputStateHandler;
if (executableStage.getSideInputs().size() > 0) {
checkNotNull(super.sideInputHandler);
StateRequestHandlers.SideInputHandlerFactory sideInputHandlerFactory =
Preconditions.checkNotNull(
FlinkStreamingSideInputHandlerFactory.forStage(
executableStage, sideInputIds, super.sideInputHandler));
try {
- return StateRequestHandlers.forSideInputHandlerFactory(
- ProcessBundleDescriptors.getSideInputs(executableStage),
sideInputHandlerFactory);
+ sideInputStateHandler =
+ StateRequestHandlers.forSideInputHandlerFactory(
+ ProcessBundleDescriptors.getSideInputs(executableStage),
sideInputHandlerFactory);
} catch (IOException e) {
- throw new RuntimeException(e);
+ throw new RuntimeException("Failed to initialize SideInputHandler", e);
+ }
+ } else {
+ sideInputStateHandler = StateRequestHandler.unsupported();
+ }
+
+ final StateRequestHandler userStateRequestHandler;
+ if (executableStage.getUserStates().size() > 0) {
+ if (keyedStateInternals == null) {
+ throw new IllegalStateException("Input must be keyed when user state
is used");
}
+ userStateRequestHandler =
+ StateRequestHandlers.forBagUserStateHandlerFactory(
+ stageBundleFactory.getProcessBundleDescriptor(),
+ new BagUserStateFactory(keyedStateInternals,
getKeyedStateBackend()));
} else {
- return StateRequestHandler.unsupported();
+ userStateRequestHandler = StateRequestHandler.unsupported();
+ }
+
+ EnumMap<TypeCase, StateRequestHandler> handlerMap = new
EnumMap<>(TypeCase.class);
+ handlerMap.put(TypeCase.MULTIMAP_SIDE_INPUT, sideInputStateHandler);
+ handlerMap.put(TypeCase.BAG_USER_STATE, userStateRequestHandler);
+
+ return StateRequestHandlers.delegateBasedUponType(handlerMap);
+ }
+
+ private static class BagUserStateFactory
+ implements StateRequestHandlers.BagUserStateHandlerFactory {
+
+ private final StateInternals stateInternals;
+ private final KeyedStateBackend<ByteBuffer> keyedStateBackend;
+
+ private BagUserStateFactory(
+ StateInternals stateInternals, KeyedStateBackend<ByteBuffer>
keyedStateBackend) {
+
+ this.stateInternals = stateInternals;
+ this.keyedStateBackend = keyedStateBackend;
+ }
+
+ @Override
+ public <K, V, W extends BoundedWindow>
+ StateRequestHandlers.BagUserStateHandler<K, V, W> forUserState(
+ String pTransformId,
+ String userStateId,
+ Coder<K> keyCoder,
+ Coder<V> valueCoder,
+ Coder<W> windowCoder) {
+ return new StateRequestHandlers.BagUserStateHandler<K, V, W>() {
+ @Override
+ public Iterable<V> get(K key, W window) {
+ prepareStateBackend(key, keyCoder);
Review comment:
Good question. I thought the GrpcStateService is an actor which processes
incoming state requests single-threaded. Since state requests itself are
synchronously processed we should have be fine then.
However, GrpcStateService can handle requests from multiple clients
concurrently, but there is not more than one connection for each Harness. So
when multiple harnesses are connected, there may be concurrent requests, which
would not be a problem because they access different operator state.
ExecutableStages should not be chained together because that would break the
assumption that they run with independent operator state. So in the case where
two chained ExecutableStages each used their own SDK harness it would be a
problem. AFAIK that shouldn't happen. Perhaps we should disable chaining for
stateful execution, just to be sure.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 158071)
Time Spent: 9h 20m (was: 9h 10m)
> Flink support for portable user state
> -------------------------------------
>
> Key: BEAM-2918
> URL: https://issues.apache.org/jira/browse/BEAM-2918
> Project: Beam
> Issue Type: Sub-task
> Components: runner-flink
> Reporter: Henning Rohde
> Assignee: Maximilian Michels
> Priority: Minor
> Labels: portability
> Time Spent: 9h 20m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)