Hangxiang Yu created FLINK-34975:
------------------------------------

             Summary: FLIP-427: ForSt - Disaggregated state Store
                 Key: FLINK-34975
                 URL: https://issues.apache.org/jira/browse/FLINK-34975
             Project: Flink
          Issue Type: New Feature
          Components: Runtime / State Backends
            Reporter: Hangxiang Yu
            Assignee: Hangxiang Yu


This is a sub-FLIP for the disaggregated state management and its related work, 
please read the [FLIP-423|https://cwiki.apache.org/confluence/x/R4p3EQ] first 
to know the whole story.

As described in FLIP-423, there are some tough issues about embedded state 
backend on local file system, respecially when dealing with extremely large 
state:
 # {*}Constraints of local disk space complicate the prediction of storage 
requirements, potentially leading to job failures{*}: Especially in cloud 
native deployment mode, pre-allocated local disks typically face strict 
capacity constraints, making it challenging to forecast the size requirements 
of job states. Over-provisioning disk space results in unnecessary resource 
overhead, while under-provisioning risks job failure due to insufficient space.
 # *The tight coupling of compute and storage resources leads to 
underutilization and increased waste:* Jobs can generally be categorized as 
either CPU-intensive or IO-intensive. In a coupled architecture, CPU-intensive 
jobs leave a significant portion of storage resources underutilized, whereas 
IO-intensive jobs result in idle computing resources.

By considering remote storage as the primary storage, all working states are 
maintained on the remote file system, which brings several advantages:
 # *Remote storages e.g. S3/HDFS typically offer elastic scalability, 
theoretically providing unlimited space.*
 # *The allocation of remote storage resources can be optimized by reducing 
them for CPU-intensive jobs and augmenting them for IO-intensive jobs, thus 
enhancing overall resource utilization.*
 # *This architecture facilitates a highly efficient and lightweight process 
for checkpointing, recovery, and rescaling through fast copy or simple move.*

This FLIP aims to realize disaggregated state for our new key-value store named 
*ForSt* which evloves from RocksDB and supports remote file system. This makes 
Flink get rid of the disadvantages by coupled state architecture and embrace 
the scalable as well as flexible cloud-native storage.

Please see [FLIP-427 |https://cwiki.apache.org/confluence/x/T4p3EQ]for more 
details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to