[jira] [Commented] (FLINK-3761) Introduce key group state backend

ASF GitHub Bot (JIRA) Tue, 24 May 2016 07:31:55 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298250#comment-15298250
 ]


ASF GitHub Bot commented on FLINK-3761:
---------------------------------------

Github user aljoscha commented on the pull request:

    https://github.com/apache/flink/pull/1988#issuecomment-221289448
  
    I started looking into it, but man this is one big change... 😃 
    
    I have some first remarks about API and internals:
    
    Whats the reason for the introduction of `PartitionedState`? The Javadoc 
for `State` already says that it is the base class for partitioned state and 
that it is only usable on a `KeyedStream`.
    
    The signature of `KeyGroupedStateBackend` and `PartitionedStateBackend` is 
exactly the same. `AbstractStateBackend` has both, method 
`createPartitionedStateBackend` and `createKeyGroupStateBackend`. Users of an 
`AbstractStateBackend` should only ever call the latter while the former is 
reserved for internal use by the default implementation for 
`KeyGroupedStateBackend` which is `GenericKeyGroupStateBackend`. Also, 
`AbstractStreamOperator` has the new method `getKeyGroupStateBackend` that 
should be used by operators such as the `WindowOperator` to deal with 
partitioned state. Now, where am I going with this? What I think is that the 
`AbstractStateBackend` should only have a method 
`createPartitionedStateBackend` that is externally visible. This would be used 
by the `AbstractStreamOperator` to create a state backend and users of the 
interface, i.e. `WindowOperator` would also deal just with 
`PartitionedStateBackend`, which they get from 
`AbstractStreamOperator.getPartitionedStateBackend`. The fact that there are 
these key groups should not be visible to users of a state backend. Internally, 
state backends would use the `GenericKeyGroupStateBackend`, they could provide 
an interface to it for creating non-key-grouped backends.
    
    Above, "exactly the same" is not 100 % correct, since the snapshot/restore 
methods differ slightly but I think this could be worked around. Also, I found 
it quite hard to express what I actually mean but I hope you get my point. 😅 


> Introduce key group state backend
> ---------------------------------
>
>                 Key: FLINK-3761
>                 URL: https://issues.apache.org/jira/browse/FLINK-3761
>             Project: Flink
>          Issue Type: Sub-task
>          Components: state backends
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>
> After an off-line discussion with [~aljoscha], we came to the conclusion that 
> it would be beneficial to reflect the differences between a keyed and a 
> non-keyed stream also in the state backends. A state backend which is used 
> for a keyed stream offers a value, list, folding and value state and has to 
> group its keys into key groups. 
> A state backend for non-keyed streams can only offer a union state to make it 
> work with dynamic scaling. A union state is a state which is broadcasted to 
> all tasks in case of a recovery. The state backends can then select what 
> information they need to recover from the whole state (formerly distributed).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3761) Introduce key group state backend

Reply via email to