[
https://issues.apache.org/jira/browse/FLINK-39807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guowei Ma updated FLINK-39807:
------------------------------
Description:
This is the umbrella issue tracking FLIP-577: AI-Native Flink — An Umbrella
Proposal for Multimodal Data Processing.
_FLIP:_
[FLIP-577|[https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275]][link
title|http://example.com]
_Discussion thread:_ [dev@
thread|[https://lists.apache.org/thread/l5obwo4kvjblqnx24btsv4rwjwoflb6d]]
User workloads are shifting from BI analytics to multimodal data processing
centered on model inference — data expands from structured records to
images/video/audio/embeddings, resources move from CPU to mixed CPU/GPU, and
execution moves from row-oriented to vectorized batch. This umbrella proposes
evolving Flink from a unified stream-batch compute engine into one that
natively supports AI workloads (AI-Native), decomposed into 11 sub-FLIPs across
three layers:
* _Layer 1 — Core Runtime Primitives:_ RpcOperator; multimodal type system and
OBJECT_REF.
* _Layer 2 — Workload Expression and Execution:_ Python DataFrame API;
multimodal Source/Sink connector API; GPU resource declaration and independent
deployment; built-in multimodal operators and AI functions; Arrow columnar
transport.
* _Layer 3 — Production-Grade Operational Guarantees:_ non-disruptive scaling
for CPU and GPU operators; Unaligned Checkpoint enhancements;
Pipeline-Region-based independent checkpoints.
Most sub-FLIPs have no hard dependencies and can be advanced in parallel. This
umbrella seeks consensus on the overall direction only; detailed design and
APIs are deferred to each sub-FLIP. All changes are incremental. Sub-FLIPs will
be tracked as separate issues and linked here.
was:
This is the umbrella issue tracking FLIP-577: AI-Native Flink — An Umbrella
Proposal for Multimodal Data Processing.
_FLIP:_
[FLIP-577|[https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275]]
_Discussion thread:_ [dev@
thread|[https://lists.apache.org/thread/l5obwo4kvjblqnx24btsv4rwjwoflb6d]]
User workloads are shifting from BI analytics to multimodal data processing
centered on model inference — data expands from structured records to
images/video/audio/embeddings, resources move from CPU to mixed CPU/GPU, and
execution moves from row-oriented to vectorized batch. This umbrella proposes
evolving Flink from a unified stream-batch compute engine into one that
natively supports AI workloads (AI-Native), decomposed into 11 sub-FLIPs across
three layers:
* _Layer 1 — Core Runtime Primitives:_ RpcOperator; multimodal type system and
OBJECT_REF.
* _Layer 2 — Workload Expression and Execution:_ Python DataFrame API;
multimodal Source/Sink connector API; GPU resource declaration and independent
deployment; built-in multimodal operators and AI functions; Arrow columnar
transport.
* _Layer 3 — Production-Grade Operational Guarantees:_ non-disruptive scaling
for CPU and GPU operators; Unaligned Checkpoint enhancements;
Pipeline-Region-based independent checkpoints.
Most sub-FLIPs have no hard dependencies and can be advanced in parallel. This
umbrella seeks consensus on the overall direction only; detailed design and
APIs are deferred to each sub-FLIP. All changes are incremental. Sub-FLIPs will
be tracked as separate issues and linked here.
> [umbrella] FLIP-577: AI-Native Flink — Umbrella for Multimodal Data Processing
> ------------------------------------------------------------------------------
>
> Key: FLINK-39807
> URL: https://issues.apache.org/jira/browse/FLINK-39807
> Project: Flink
> Issue Type: New Feature
> Reporter: Guowei Ma
> Assignee: Guowei Ma
> Priority: Major
>
> This is the umbrella issue tracking FLIP-577: AI-Native Flink — An Umbrella
> Proposal for Multimodal Data Processing.
> _FLIP:_
> [FLIP-577|[https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275]][link
> title|http://example.com]
> _Discussion thread:_ [dev@
> thread|[https://lists.apache.org/thread/l5obwo4kvjblqnx24btsv4rwjwoflb6d]]
> User workloads are shifting from BI analytics to multimodal data processing
> centered on model inference — data expands from structured records to
> images/video/audio/embeddings, resources move from CPU to mixed CPU/GPU, and
> execution moves from row-oriented to vectorized batch. This umbrella proposes
> evolving Flink from a unified stream-batch compute engine into one that
> natively supports AI workloads (AI-Native), decomposed into 11 sub-FLIPs
> across three layers:
> * _Layer 1 — Core Runtime Primitives:_ RpcOperator; multimodal type system
> and OBJECT_REF.
> * _Layer 2 — Workload Expression and Execution:_ Python DataFrame API;
> multimodal Source/Sink connector API; GPU resource declaration and
> independent deployment; built-in multimodal operators and AI functions; Arrow
> columnar transport.
> * _Layer 3 — Production-Grade Operational Guarantees:_ non-disruptive
> scaling for CPU and GPU operators; Unaligned Checkpoint enhancements;
> Pipeline-Region-based independent checkpoints.
> Most sub-FLIPs have no hard dependencies and can be advanced in parallel.
> This umbrella seeks consensus on the overall direction only; detailed design
> and APIs are deferred to each sub-FLIP. All changes are incremental.
> Sub-FLIPs will be tracked as separate issues and linked here.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)