[
https://issues.apache.org/jira/browse/FLINK-34982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruan Hang updated FLINK-34982:
------------------------------
Fix Version/s: 2.3.0
(was: 2.2.0)
> FLIP-428: Fault Tolerance/Rescale Integration for Disaggregated State
> ---------------------------------------------------------------------
>
> Key: FLINK-34982
> URL: https://issues.apache.org/jira/browse/FLINK-34982
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Checkpointing, Runtime / State Backends
> Affects Versions: 2.1.0
> Reporter: Jinzhong Li
> Priority: Major
> Fix For: 2.3.0
>
>
> This is a sub-FLIP for the disaggregated state management and its related
> work, please read the [FLIP-423|https://cwiki.apache.org/confluence/x/R4p3EQ]
> first to know the whole story.
> As outlined in [FLIP-423|https://cwiki.apache.org/confluence/x/R4p3EQ] [1]
> and [FLIP-427|https://cwiki.apache.org/confluence/x/T4p3EQ] [2], we proposed
> to disaggregate StateManagement and introduced a disaggregated state storage
> named ForSt, which evolves from RocksDB. Within the new framework, where the
> primary storage is placed on the remote file system, several challenges
> emerge when attempting to reuse the existing fault-tolerance mechanisms of
> local RocksDB:
> * Because most remote file system don't support hard-link, ForSt can't
> utilize hard-link to capture a consistent snapshot during checkpoint
> synchronous phase as rocksdb currently does.
> * The existing file transfer mechanism within RocksDB is inefficient during
> checkpoints; it involves first downloading the remote working state data to
> local memory and then uploading it to the checkpoint directory. Likewise,
> both restore and rescale face the similar problems due to superfluous data
> transmission.
> In order to solve the above problems and improve checkpoint/restore/rescaling
> performance of disaggregated storage, this FLIP proposes:
> # A new checkpoint strategy for disaggregated state storage: leverage
> RocksDB's low-level api to retain a consistent snapshot during the checkpoint
> synchronous phase; and then transfer the snapshot files to checkpoint
> directory during asynchronous phase;
> # Accelerating checkpoint/restore/rescaling by leverage fast-duplication of
> remote file system, which can bypass the local TaskManager when transferring
> data between remote working directory and checkpoint directory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)