[ 
https://issues.apache.org/jira/browse/FLINK-34982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weijie Guo updated FLINK-34982:
-------------------------------
    Affects Version/s: 2.1.0

> FLIP-428: Fault Tolerance/Rescale Integration for Disaggregated State
> ---------------------------------------------------------------------
>
>                 Key: FLINK-34982
>                 URL: https://issues.apache.org/jira/browse/FLINK-34982
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Checkpointing, Runtime / State Backends
>    Affects Versions: 2.1.0
>            Reporter: Jinzhong Li
>            Priority: Major
>             Fix For: 2.0.0
>
>
> This is a sub-FLIP for the disaggregated state management and its related 
> work, please read the [FLIP-423|https://cwiki.apache.org/confluence/x/R4p3EQ] 
> first to know the whole story.
> As outlined in [FLIP-423|https://cwiki.apache.org/confluence/x/R4p3EQ] [1] 
> and [FLIP-427|https://cwiki.apache.org/confluence/x/T4p3EQ] [2], we proposed 
> to disaggregate StateManagement and introduced a disaggregated state storage 
> named ForSt, which evolves from RocksDB. Within the new framework, where the 
> primary storage is placed on the remote file system, several challenges 
> emerge when attempting to reuse the existing fault-tolerance mechanisms of 
> local RocksDB:
>  * Because most remote file system don't support hard-link, ForSt can't 
> utilize hard-link to capture a consistent snapshot during checkpoint 
> synchronous phase as rocksdb currently does.
>  * The existing file transfer mechanism within RocksDB is inefficient during 
> checkpoints; it involves first downloading the remote working state data to 
> local memory and then uploading it to the checkpoint directory. Likewise, 
> both restore and rescale face the similar problems due to superfluous data 
> transmission.
> In order to solve the above problems and improve checkpoint/restore/rescaling 
> performance of disaggregated storage, this FLIP proposes:
>  # A new checkpoint strategy for disaggregated state storage: leverage 
> RocksDB's low-level api to retain a consistent snapshot during the checkpoint 
> synchronous phase; and then transfer the snapshot files to checkpoint 
> directory during asynchronous phase;
>  # Accelerating checkpoint/restore/rescaling by leverage fast-duplication of 
> remote file system, which can bypass the local TaskManager when transferring 
> data between remote working directory and checkpoint directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to