[ 
https://issues.apache.org/jira/browse/FLINK-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Qin updated FLINK-4266:
----------------------------
    Description: 
Current FileSystem statebackend limits whole state size to disk space. Dealing 
with scale out checkpoint states beyond local disk space such as long running 
task that hold window content for long period of time. Pipelines needs to split 
out states to durable remote storage even replicated to different data centers.

We draft a design that leverage checkpoint id as mono incremental logic 
timestamp and perform range query to get evicited state k/v. we also introduce 
checkpoint time commit and eviction threshold that reduce "hot states" hitting 
remote db per every update between adjacent checkpoints by tracking update logs 
and merge them, do batch updates only when checkpoint; lastly, we are looking 
for eviction policy that can identify "hot keys" in k/v state and lazy load 
those "cold keys" from remote storage(e.g Cassandra).

For now, we don't have good story regarding to data retirement. We might have 
to keep forever until manually run command and clean per job related state 
data. Some of features might related to incremental checkpointing feature, we 
hope to align with effort there also.

Welcome comments, I will try to put a draft design doc after gathering some 
feedback.







  was:
Current FileSystem statebackend limits whole state size to disk space. 
For long running task that hold window content for long period of time, it 
needs to split out states to durable remote storage and replicated across data 
centers.

We look into implementation from leverage checkpoint timestamp as versioning 
and do range query to get current state; we also want to reduce "hot states" 
hitting remote db per every update between adjacent checkpoints by tracking 
update logs and merge them, do batch updates only when checkpoint; lastly, we 
are looking for eviction policy that can identify "hot keys" in k/v state and 
lazy load those "cold keys" from Cassandra.

For now, we don't have good story regarding to data retirement. We might have 
to keep forever until manually run command and clean per job related state 
data. Some of features might related to incremental checkpointing feature, we 
hope to align with effort there also.

Welcome comments, I will try to put a draft design doc after gathering some 
feedback.








> Remote Storage Statebackend
> ---------------------------
>
>                 Key: FLINK-4266
>                 URL: https://issues.apache.org/jira/browse/FLINK-4266
>             Project: Flink
>          Issue Type: New Feature
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.0.3, 1.2.0
>            Reporter: Chen Qin
>            Priority: Minor
>
> Current FileSystem statebackend limits whole state size to disk space. 
> Dealing with scale out checkpoint states beyond local disk space such as long 
> running task that hold window content for long period of time. Pipelines 
> needs to split out states to durable remote storage even replicated to 
> different data centers.
> We draft a design that leverage checkpoint id as mono incremental logic 
> timestamp and perform range query to get evicited state k/v. we also 
> introduce checkpoint time commit and eviction threshold that reduce "hot 
> states" hitting remote db per every update between adjacent checkpoints by 
> tracking update logs and merge them, do batch updates only when checkpoint; 
> lastly, we are looking for eviction policy that can identify "hot keys" in 
> k/v state and lazy load those "cold keys" from remote storage(e.g Cassandra).
> For now, we don't have good story regarding to data retirement. We might have 
> to keep forever until manually run command and clean per job related state 
> data. Some of features might related to incremental checkpointing feature, we 
> hope to align with effort there also.
> Welcome comments, I will try to put a draft design doc after gathering some 
> feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to