[ 
https://issues.apache.org/jira/browse/FLINK-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948855#comment-14948855
 ] 

ASF GitHub Bot commented on FLINK-2808:
---------------------------------------

Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1239#issuecomment-146581025
  
    Thanks for the fast and good feedback.
    
    Concerning the removal of non-partitioned operator state:
      - As per prior discussion offline, I wanted to consolidate this to a core 
feature set, since it is hard to remove such features once they are released 
and endorsed
      - For simple states, let us add an annotation `@State` (with possible 
checkpointer/serializer `@State(checkpointer=new BloomFilterCheckpointer())`) 
which serves the same purpose and would be more lightweight even. I would put 
that on the list for the next release.
    
    Multiple key/value states per operator: Will re-add that (simple effort), 
it is in fact very useful
    
    Closing the state backend: Will add that, good idea
    
    Concerning context info during checkpointing:
      - You should have access to the RuntimeContext when drawing the snapshot, 
which means you can access the task index. Let us also add the JobVertexID to 
the RuntimeContext and these values stay deterministic across restarts (state 
handles are stored under JobVertexID + task index).
    
      - I was thinking about giving access to the state backend in the 
`Checkpointed` interface methods, but let's make a design for this for the next 
release.
    
    With addressing these issues, any objections against adding this?


> Rework / Extend the StatehandleProvider
> ---------------------------------------
>
>                 Key: FLINK-2808
>                 URL: https://issues.apache.org/jira/browse/FLINK-2808
>             Project: Flink
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 0.10
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>             Fix For: 0.10
>
>
> I would like to make some changes (mostly additions) to the 
> {{StateHandleProvider}}. Ideally for the upcoming release, as it is somewhat 
> part of the public API.
> The rational behind this is to handle in a nice and extensible way the 
> creation of key/value state backed by various implementations (FS, 
> distributed KV store, local KV store with FS backup, ...) and various 
> checkpointing ways (full dump, append, incremental keys, ...)
> The changes would concretely be:
> 1.  There should be a default {{StateHandleProvider}} set on the execution 
> environment. Functions can later specify the {{StateHandleProvider}} when 
> grabbing the {{StreamOperatorState}} from the runtime context (plus 
> optionally a {{Checkpointer}})
> 2.  The {{StreamOperatorState}} is created from the {{StateHandleProvider}}. 
> That way, a KeyValueStore state backend can create a {{StreamOperatorState}} 
> that directly updates data in the KV store on every access, if that is 
> desired (and filter accesses by timestamps to only show committed data)
> 3.  The StateHandleProvider should have methods to get an output stream that 
> writes to the state checkpoint directly (and returns a StateHandle upon 
> closing). That way we can convert and dump large state into the checkpoint 
> without crating a full copy in memory before.
> Lastly, I would like to change some names
>   - {{StateHandleProvider}} to either {{StateBackend}}, {{StateStore}}, or 
> {{StateProvider}} (simpler name).
>   - {{StreamOperatorState}} to either {{State}} or {{KVState}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to