[
https://issues.apache.org/jira/browse/FLINK-34973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zakelly Lan updated FLINK-34973:
--------------------------------
Description:
This is a sub-FLIP for the disaggregated state management and its related work,
please read the FLIP-423 first to know the whole story.
FLIP-424 introduces asynchronous state APIs with callbacks allowing state
access to be executed in threads separate from the task thread, making better
usage of I/O bandwidth and enhancing throughput. This FLIP proposes an
execution framework for asynchronous state APIs. The execution code path for
the new API is completely independent from the original one, where many runtime
components are redesigned. We intend to delve into the challenges associated
with asynchronous execution and provide an in-depth design analysis for each
module. Furthermore, we will conduct a performance analysis of the new
framework relative to the current implementation and examine how it measures up
against other potential alternatives.
was:
The past decade has witnessed a dramatic shift in Flink's deployment mode,
workload patterns, and hardware improvements. We've moved from the map-reduce
era where workers are computation-storage tightly coupled nodes to a
cloud-native world where containerized deployments on Kubernetes become
standard. To enable Flink's Cloud-Native future, we introduce Disaggregated
State Storage and Management that uses DFS as primary storage in Flink 2.0, as
promised in the Flink 2.0 Roadmap.
Detailed design and story:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=293046855
Also sub-FLIPs:
- Asynchronous State APIs
([FLIP-424|https://cwiki.apache.org/confluence/x/SYp3EQ]): Introduce new APIs
for asynchronous state access.
- Asynchronous Execution Model
([FLIP-425|https://cwiki.apache.org/confluence/x/S4p3EQ]): Implement a
non-blocking execution model leveraging the asynchronous APIs introduced in
FLIP-424.
- Grouping Remote State Access
([FLIP-426|https://cwiki.apache.org/confluence/x/TYp3EQ]): Enable retrieval of
remote state data in batches to avoid unnecessary round-trip costs for remote
access.
- Disaggregated State Store
([FLIP-427|https://cwiki.apache.org/confluence/x/T4p3EQ]): Introduce the
initial version of the ForSt disaggregated state store.
- Fault Tolerance/Rescale Integration
([FLIP-428|https://cwiki.apache.org/confluence/x/UYp3EQ]): Integrate
checkpointing mechanisms with the disaggregated state store for fault tolerance
and fast rescaling.
> FLIP-425: Asynchronous Execution Model
> --------------------------------------
>
> Key: FLINK-34973
> URL: https://issues.apache.org/jira/browse/FLINK-34973
> Project: Flink
> Issue Type: New Feature
> Reporter: Zakelly Lan
> Priority: Major
> Fix For: 2.0.0
>
>
> This is a sub-FLIP for the disaggregated state management and its related
> work, please read the FLIP-423 first to know the whole story.
> FLIP-424 introduces asynchronous state APIs with callbacks allowing state
> access to be executed in threads separate from the task thread, making better
> usage of I/O bandwidth and enhancing throughput. This FLIP proposes an
> execution framework for asynchronous state APIs. The execution code path for
> the new API is completely independent from the original one, where many
> runtime components are redesigned. We intend to delve into the challenges
> associated with asynchronous execution and provide an in-depth design
> analysis for each module. Furthermore, we will conduct a performance analysis
> of the new framework relative to the current implementation and examine how
> it measures up against other potential alternatives.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)