[
https://issues.apache.org/jira/browse/FLINK-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298250#comment-15298250
]
ASF GitHub Bot commented on FLINK-3761:
---------------------------------------
Github user aljoscha commented on the pull request:
https://github.com/apache/flink/pull/1988#issuecomment-221289448
I started looking into it, but man this is one big change... 😃
I have some first remarks about API and internals:
Whats the reason for the introduction of `PartitionedState`? The Javadoc
for `State` already says that it is the base class for partitioned state and
that it is only usable on a `KeyedStream`.
The signature of `KeyGroupedStateBackend` and `PartitionedStateBackend` is
exactly the same. `AbstractStateBackend` has both, method
`createPartitionedStateBackend` and `createKeyGroupStateBackend`. Users of an
`AbstractStateBackend` should only ever call the latter while the former is
reserved for internal use by the default implementation for
`KeyGroupedStateBackend` which is `GenericKeyGroupStateBackend`. Also,
`AbstractStreamOperator` has the new method `getKeyGroupStateBackend` that
should be used by operators such as the `WindowOperator` to deal with
partitioned state. Now, where am I going with this? What I think is that the
`AbstractStateBackend` should only have a method
`createPartitionedStateBackend` that is externally visible. This would be used
by the `AbstractStreamOperator` to create a state backend and users of the
interface, i.e. `WindowOperator` would also deal just with
`PartitionedStateBackend`, which they get from
`AbstractStreamOperator.getPartitionedStateBackend`. The fact that there are
these key groups should not be visible to users of a state backend. Internally,
state backends would use the `GenericKeyGroupStateBackend`, they could provide
an interface to it for creating non-key-grouped backends.
Above, "exactly the same" is not 100 % correct, since the snapshot/restore
methods differ slightly but I think this could be worked around. Also, I found
it quite hard to express what I actually mean but I hope you get my point. 😅
> Introduce key group state backend
> ---------------------------------
>
> Key: FLINK-3761
> URL: https://issues.apache.org/jira/browse/FLINK-3761
> Project: Flink
> Issue Type: Sub-task
> Components: state backends
> Reporter: Till Rohrmann
> Assignee: Till Rohrmann
>
> After an off-line discussion with [~aljoscha], we came to the conclusion that
> it would be beneficial to reflect the differences between a keyed and a
> non-keyed stream also in the state backends. A state backend which is used
> for a keyed stream offers a value, list, folding and value state and has to
> group its keys into key groups.
> A state backend for non-keyed streams can only offer a union state to make it
> work with dynamic scaling. A union state is a state which is broadcasted to
> all tasks in case of a recovery. The state backends can then select what
> information they need to recover from the whole state (formerly distributed).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)