[
https://issues.apache.org/jira/browse/FLINK-34975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weijie Guo updated FLINK-34975:
-------------------------------
Fix Version/s: 2.1.0
(was: 2.0.0)
> FLIP-427: ForSt - Disaggregated State Store
> -------------------------------------------
>
> Key: FLINK-34975
> URL: https://issues.apache.org/jira/browse/FLINK-34975
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / State Backends
> Reporter: Hangxiang Yu
> Assignee: Hangxiang Yu
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.1.0
>
>
> This is a sub-FLIP for the disaggregated state management and its related
> work, please read the [FLIP-423|https://cwiki.apache.org/confluence/x/R4p3EQ]
> first to know the whole story.
> As described in FLIP-423, there are some tough issues about embedded state
> backend on local file system, respecially when dealing with extremely large
> state:
> # {*}Constraints of local disk space complicate the prediction of storage
> requirements, potentially leading to job failures{*}: Especially in cloud
> native deployment mode, pre-allocated local disks typically face strict
> capacity constraints, making it challenging to forecast the size requirements
> of job states. Over-provisioning disk space results in unnecessary resource
> overhead, while under-provisioning risks job failure due to insufficient
> space.
> # *The tight coupling of compute and storage resources leads to
> underutilization and increased waste:* Jobs can generally be categorized as
> either CPU-intensive or IO-intensive. In a coupled architecture,
> CPU-intensive jobs leave a significant portion of storage resources
> underutilized, whereas IO-intensive jobs result in idle computing resources.
> By considering remote storage as the primary storage, all working states are
> maintained on the remote file system, which brings several advantages:
> # *Remote storages e.g. S3/HDFS typically offer elastic scalability,
> theoretically providing unlimited space.*
> # *The allocation of remote storage resources can be optimized by reducing
> them for CPU-intensive jobs and augmenting them for IO-intensive jobs, thus
> enhancing overall resource utilization.*
> # *This architecture facilitates a highly efficient and lightweight process
> for checkpointing, recovery, and rescaling through fast copy or simple move.*
> This FLIP aims to realize disaggregated state for our new key-value store
> named *ForSt* which evloves from RocksDB and supports remote file system.
> This makes Flink get rid of the disadvantages by coupled state architecture
> and embrace the scalable as well as flexible cloud-native storage.
> Please see [FLIP-427 |https://cwiki.apache.org/confluence/x/T4p3EQ]for more
> details.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)