[ https://issues.apache.org/jira/browse/FLINK-34982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weijie Guo updated FLINK-34982: ------------------------------- Affects Version/s: 2.1.0 > FLIP-428: Fault Tolerance/Rescale Integration for Disaggregated State > --------------------------------------------------------------------- > > Key: FLINK-34982 > URL: https://issues.apache.org/jira/browse/FLINK-34982 > Project: Flink > Issue Type: New Feature > Components: Runtime / Checkpointing, Runtime / State Backends > Affects Versions: 2.1.0 > Reporter: Jinzhong Li > Priority: Major > Fix For: 2.0.0 > > > This is a sub-FLIP for the disaggregated state management and its related > work, please read the [FLIP-423|https://cwiki.apache.org/confluence/x/R4p3EQ] > first to know the whole story. > As outlined in [FLIP-423|https://cwiki.apache.org/confluence/x/R4p3EQ] [1] > and [FLIP-427|https://cwiki.apache.org/confluence/x/T4p3EQ] [2], we proposed > to disaggregate StateManagement and introduced a disaggregated state storage > named ForSt, which evolves from RocksDB. Within the new framework, where the > primary storage is placed on the remote file system, several challenges > emerge when attempting to reuse the existing fault-tolerance mechanisms of > local RocksDB: > * Because most remote file system don't support hard-link, ForSt can't > utilize hard-link to capture a consistent snapshot during checkpoint > synchronous phase as rocksdb currently does. > * The existing file transfer mechanism within RocksDB is inefficient during > checkpoints; it involves first downloading the remote working state data to > local memory and then uploading it to the checkpoint directory. Likewise, > both restore and rescale face the similar problems due to superfluous data > transmission. > In order to solve the above problems and improve checkpoint/restore/rescaling > performance of disaggregated storage, this FLIP proposes: > # A new checkpoint strategy for disaggregated state storage: leverage > RocksDB's low-level api to retain a consistent snapshot during the checkpoint > synchronous phase; and then transfer the snapshot files to checkpoint > directory during asynchronous phase; > # Accelerating checkpoint/restore/rescaling by leverage fast-duplication of > remote file system, which can bypass the local TaskManager when transferring > data between remote working directory and checkpoint directory. -- This message was sent by Atlassian Jira (v8.20.10#820010)