Re: Re: [DISCUSS] FLIP-577: AI-Native Flink — An Umbrella Proposal for Multimodal Data Processing

Guowei Ma Thu, 21 May 2026 02:19:07 -0700

Hi Martijn

Thanks for the heads-up.


That sounds good to me. Since the proposal has been stable for a while,
could
you share a rough idea of when you expect to finish your final review? A
tentative date would be enough, and I can hold off starting the vote until
then.

Best,
Guowei


On Thu, May 21, 2026 at 4:47 PM Martijn Visser <[email protected]>
wrote:

> Hi Guowei,
>
> I would like to have a couple of more days to take a closer final look
> with a final round of comments, I haven't been able to finalize
> everything yet on this subject.
>
> Best regards,
>
> Martijn
>
> Op wo 20 mei 2026 om 15:04 schreef Guowei Ma <[email protected]>:
> >
> > Hi all,
> >
> > Thanks everyone for the discussion and feedback on the FLIP.
> >
> > If there are no further comments or concerns, I plan to start the vote on
> > Thursday.
> >
> >
> > Best,
> > Guowei
> >
> >
> > On Fri, May 15, 2026 at 12:21 PM Jiangang Liu <[email protected]
> >
> > wrote:
> >
> > > I Greatly support this proposal. AI-oriented and multimodal data
> processing
> > > is already becoming a real production workload, and Flink is well
> > > positioned to serve as the dataflow backbone for these pipelines.
> > >
> > > The proposal correctly focuses on engine-level capabilities rather than
> > > turning Flink into an AI framework or model serving platform. In
> > > particular, the shared runtime contract across multimodal types, object
> > > references, inference invocation, GPU resource management, scaling, and
> > > recovery is essential. Without such coordination, users would have to
> > > stitch together fragmented abstractions themselves. I also appreciate
> the
> > > incremental and backward-compatible approach, as well as the
> conservative
> > > scope for correctness-sensitive runtime changes.
> > >
> > > This direction can help Flink extend its existing strengths in
> streaming,
> > > checkpointing, and production operations to a new class of AI-driven
> > > workloads.
> > >
> > > Best
> > > Jiangang Liu
> > >
> > >
> > > Lincoln Lee <[email protected]> 于2026年5月11日周一 14:47写道：
> > >
> > > > Hi Guowei and all,
> > > >
> > > > Big +1 on the overall direction of FLIP-577.
> > > >
> > > > From the SQL ecosystem perspective, over the past year the community
> > > > has introduced AI functions (e.g., ML_PREDICT, VECTOR_SEARCH) to help
> > > > users leverage Flink in AI scenarios. Through community interactions
> > > > and enterprise user feedback since then, we can strongly sense the
> > > > growing expectation for Flink to go further — broader multimodal data
> > > > processing capabilities, improved runtime stability under AI
> > > > workloads, and more optimized execution for inference-heavy
> pipelines.
> > > >
> > > > This FLIP addresses exactly those gaps. In particular, I'm excited
> about:
> > > >
> > > >  - Multimodal data type extensions — first-class types like
> > > >   Tensor, Image, Embedding, and Audio in the type system will unlock
> > > >   native SQL/Table expressions over multimodal data, rather than
> > > >   forcing users to work around opaque BYTES columns.
> > > >  - Object reference mechanism — a critical optimization for
> > > >   large-payload multimodal data. Passing multi-megabyte blobs through
> > > >   the standard data path is a known pain point today; a
> > > >   reference-based approach will significantly reduce serialization
> > > >   overhead and memory pressure.
> > > >  - Richer built-in AI functions — standardizing common multimodal
> > > >   operations (embedding generation, similarity search, model
> > > >   inference) as engine-level functions will eliminate the repeated
> > > >   ad-hoc UDF implementations we see across organizations today, and
> > > >   open the door for engine-level optimizations like shared model
> > > >   handles and adaptive batching.
> > > >
> > > > As Jark mentioned, we'd like to participate in and contribute to the
> > > > relevant sub-FLIPs.
> > > >
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Guowei Ma <[email protected]> 于2026年5月10日周日 21:26写道：
> > > >
> > > > > Hi all,
> > > > >
> > > > > Thanks again for the thoughtful feedback and the valuable
> perspectives
> > > > > shared in this discussion.
> > > > >
> > > > > I have updated FLIP-577 [1] based on the discussion in this
> thread. The
> > > > > overall direction remains the same, but I have tried to make the
> scope,
> > > > > motivation, and boundaries clearer.
> > > > >
> > > > > The main changes are:
> > > > >
> > > > >    1.
> > > > >
> > > > >    Clarified the target workload as AI-oriented data processing
> > > > workloads,
> > > > >    instead of relying only on the broader "AI-Native" wording.
> > > > >    2.
> > > > >
> > > > >    Added explicit non-goals to make clear that this proposal does
> not
> > > aim
> > > > >    to turn Flink into an AI framework, ML platform, or model
> serving
> > > > > system.
> > > > >    3.
> > > > >
> > > > >    Added "Why Now" and "Why Flink" sections to better explain the
> > > > >    production signals, ecosystem trends, and why Flink's existing
> > > runtime
> > > > >    strengths are relevant here.
> > > > >    4.
> > > > >
> > > > >    Reworked the umbrella rationale. The key point is not that every
> > > > >    sub-FLIP is AI-specific, but that these mechanisms need a shared
> > > > runtime
> > > > >    contract across data representation, service invocation, GPU
> > > > resources,
> > > > >    scaling, and recovery.
> > > > >    5.
> > > > >
> > > > >    Clarified that engine-level primitives should consider
> SQL/Table,
> > > Java
> > > > >    DataStream, and Python DataFrame.
> > > > >    6.
> > > > >
> > > > >    Made the initial correctness scope of runtime mechanisms —
> > > > >    non-disruptive scaling, UAC enhancements, and Pipeline Region
> > > > > checkpoints —
> > > > >    more conservative, with explicit opt-in where default behavior
> is
> > > > > affected.
> > > > >
> > > > > I also tried to reflect the earlier questions raised in this
> thread in
> > > > the
> > > > > corresponding sections of the updated document.
> > > > >
> > > > > Looking forward to the continued discussion.
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275
> > > > >
> > > > > Best,
> > > > > Guowei
> > > > >
> > > > >
> > > > > On Sun, May 10, 2026 at 4:19 PM Xintong Song <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > First, big thanks to Guowei for kicking off this important
> > > discussion,
> > > > > and
> > > > > > to the community for the substantive engagement over the past
> days.
> > > I'd
> > > > > > also like to share some of my own thoughts.
> > > > > >
> > > > > > My apologies for joining late — I've been tied up with other
> things
> > > and
> > > > > > only recently started catching up. Given the length of this
> thread,
> > > > it's
> > > > > > possible I haven't fully digested every detail. If I miss
> anything,
> > > > > please
> > > > > > bear with me and feel free to point it out.
> > > > > >
> > > > > > Let me state my position upfront: a big +1 to this FLIP. Based
> on the
> > > > > > workload evolution I've been observing, I believe moving Flink
> toward
> > > > > > AI-Native is necessary — this is a project- and community-level
> > > effort
> > > > > that
> > > > > > matters deeply for how Flink responds to the industry shift
> driven by
> > > > AI,
> > > > > > and how it captures the new opportunities that come with it. We
> may
> > > > still
> > > > > > need more discussion to align on specific technical details, but
> at
> > > the
> > > > > > strategic level of where the community should head, I strongly
> > > support
> > > > > this
> > > > > > umbrella.
> > > > > >
> > > > > > Below are my thoughts on a few key questions raised in the
> discussion
> > > > so
> > > > > > far.
> > > > > >
> > > > > >
> > > > > > 1. On the scope of the umbrella
> > > > > >
> > > > > > Robert, Yaroslav, and Martijn have all pointed out that some
> items in
> > > > the
> > > > > > FLIP — Pipeline Region independent checkpointing, UAC
> enhancements,
> > > > > > non-disruptive scaling, etc. — are not unique to AI scenarios and
> > > would
> > > > > > also apply elsewhere. That observation is fair.
> > > > > >
> > > > > > However, I'd argue the right criterion for whether something
> belongs
> > > > > under
> > > > > > this umbrella is not whether it is *exclusively* used for AI, but
> > > > whether
> > > > > > it is *essential* to the central goal here — AI-Native Flink —
> i.e.,
> > > > > > whether AI-Native Flink would actually fall short without it.
> > > > > >
> > > > > > From that angle, I agree with what Guowei has already laid out:
> under
> > > > AI
> > > > > > workload characteristics (per-record processing cost orders of
> > > > magnitude
> > > > > > higher, GPU resources expensive and scarce, long and long-tailed
> > > > > > asynchronous inference), these otherwise-general mechanisms get
> > > pushed
> > > > to
> > > > > > the limit of their current implementations. They may be adequate
> for
> > > > > > traditional BI/ETL workloads, but they become real bottlenecks
> under
> > > AI
> > > > > > workloads. Without them, the AI-Native Flink story would be
> > > incomplete.
> > > > > So
> > > > > > I support including them under this umbrella.
> > > > > >
> > > > > >
> > > > > > 2. On usage and maintenance complexity
> > > > > >
> > > > > > I understand the concerns Yaroslav and Martijn raised about
> > > complexity
> > > > > > costs. That said, I don't think we should conservatively reject a
> > > > > > capability because of complexity, if that capability is critical
> and
> > > > > brings
> > > > > > substantial benefits. On the necessity of RpcOperator
> specifically,
> > > Zhu
> > > > > > Zhu, Guowei, and others have already provided answers earlier in
> this
> > > > > > thread, and I'll add some context from the Flink Agents side
> below.
> > > > > >
> > > > > > As a side observation: with the rapid progress of AI-assisted
> > > > development
> > > > > > over the past year or two, the way engineers work is shifting in
> a
> > > > > > noticeable way. On one hand, AI coding tools are helping
> developers
> > > > > > maintain complex software systems at lower effort. On the other
> hand,
> > > > > more
> > > > > > and more users are starting to ask AI how to use Flink
> correctly, and
> > > > > even
> > > > > > letting AI help them develop and operate Flink workloads. Usage
> and
> > > > > > maintenance complexity still matter, but their relative weight is
> > > > > arguably
> > > > > > going down — this is a trend worth factoring into long-term
> tradeoffs
> > > > at
> > > > > > the community level.
> > > > > >
> > > > > >
> > > > > > 3. On "wait for the industry to stabilize before integrating"
> > > > > >
> > > > > > Yaroslav raised a concern about the timing — the LLM tooling
> > > landscape
> > > > is
> > > > > > changing fast, and it's hard to predict what will be needed
> tomorrow.
> > > > I'd
> > > > > > offer a different perspective.
> > > > > >
> > > > > > It's *precisely because* the industry is evolving fast and
> hasn't yet
> > > > > > settled into a stable shape that Flink should join earlier — to
> > > > influence
> > > > > > and help define how that shape forms. If we just stand by
> waiting for
> > > > the
> > > > > > landscape to settle, by the time it does, there may not be a
> place
> > > for
> > > > > > Flink in it, and we'd have missed an important window.
> > > > > >
> > > > > > In fact, projects like Daft and Ray Data are increasingly
> defining
> > > the
> > > > de
> > > > > > facto standards in this space, as Leonard pointed out earlier in
> this
> > > > > > thread. If Flink doesn't engage proactively, even keeping up
> later
> > > will
> > > > > put
> > > > > > us in a reactive position.
> > > > > >
> > > > > > Engaging early does carry some cost of trial and error — some
> > > > > capabilities
> > > > > > may need to be deprecated or restructured down the line. But I'd
> > > argue
> > > > > this
> > > > > > is a normal part of how open-source projects evolve, and it's
> both
> > > > > > necessary and worth it.
> > > > > >
> > > > > >
> > > > > > 4. RpcOperator from a Flink Agents perspective
> > > > > >
> > > > > > The benefits of RpcOperator have already been articulated by
> several
> > > > > people
> > > > > > earlier in the thread. I'd like to add a concrete scenario from
> the
> > > > Flink
> > > > > > Agents side — the subagent pattern.
> > > > > >
> > > > > >
> > > > > > 4.1 Brief intro to the subagent pattern
> > > > > >
> > > > > > In agent practice, after planning, the main agent often spawns
> one or
> > > > > more
> > > > > > subagents to handle specific subtasks. Each subagent has its own
> LLM
> > > > > > context — its own system prompt, its own context window, its own
> > > > toolset
> > > > > > and permissions. The main agent only receives the final result
> from
> > > the
> > > > > > subagent, not the intermediate steps. This pattern is now widely
> > > > adopted
> > > > > in
> > > > > > the industry, and many agent skills explicitly say "spawn a
> subagent
> > > to
> > > > > do
> > > > > > X".
> > > > > >
> > > > > > Flink Agents wants to support this pattern for several reasons:
> > > > > > compatibility with the existing agent skill ecosystem; context
> > > > isolation,
> > > > > > so that intermediate state during subagent execution doesn't
> pollute
> > > > the
> > > > > > main agent's context; and one additional advantage that
> enterprise
> > > > > > production scenarios have over single-machine personal-assistant
> > > > agents —
> > > > > > the ability to dispatch resource-intensive subtasks to a shared
> > > > resource
> > > > > > pool for efficient resource reuse.
> > > > > >
> > > > > >
> > > > > > 4.2 Why this needs RpcOperator instead of in-place execution
> inside
> > > the
> > > > > > Flink Agents operator
> > > > > >
> > > > > > Why not just spawn subagents directly inside the Flink Agents
> > > operator?
> > > > > > Because subagent workloads are highly bursty — within the same
> job,
> > > one
> > > > > > minute's input might need several subagents for deep processing;
> the
> > > > next
> > > > > > minute's input might need none. Under that kind of randomness,
> > > > statically
> > > > > > allocating subagent resources per operator parallelism leaves no
> good
> > > > > > answer: under-allocate and the operator gets blown out the moment
> > > > demand
> > > > > > arrives; over-allocate and resources sit idle most of the time.
> > > > > >
> > > > > > The natural answer is to put subagent execution into a shared
> > > resource
> > > > > pool
> > > > > > that can be used by multiple upstream operator subtasks, and
> that can
> > > > > scale
> > > > > > elastically based on load. This maps directly onto the design and
> > > core
> > > > > > benefits of RpcOperator that Zhu Zhu laid out earlier — each
> > > > RpcOperator
> > > > > > instance forming its own Pipelined Region, which in turn enables
> > > > > > independent scaling and flexible load balancing.
> > > > > >
> > > > > >
> > > > > > 4.3 Why this shared resource pool should be inside Flink rather
> than
> > > > > > deployed externally
> > > > > >
> > > > > > Two reasons.
> > > > > >
> > > > > > First, the prompts, toolsets, and permission configurations a
> > > subagent
> > > > > > needs at execution time are essentially part of the Flink Agents
> *job
> > > > > > definition*. Stripping them out of the job and deploying them
> > > > separately
> > > > > > effectively splits the job definition in two, leaving users to
> > > > maintain a
> > > > > > synchronization relationship between an external deployment and
> the
> > > > Flink
> > > > > > job. This goes against the developer experience Flink Agents
> aims to
> > > > > > provide.
> > > > > >
> > > > > > Second, subagent execution needs to be incorporated into Flink's
> > > > > checkpoint
> > > > > > mechanism. The notion of "stateful" here deserves a brief
> > > > clarification:
> > > > > > each subagent task is self-contained — the full context it needs
> is
> > > > > handed
> > > > > > over by the main agent at dispatch time in one shot. That is,
> > > subagents
> > > > > > don't need to share state across records, so at the task level
> they
> > > > look
> > > > > > stateless. However, executing a single subagent task takes a
> > > > significant
> > > > > > amount of time and involves multiple rounds of model calls and
> tool
> > > > > calls,
> > > > > > where tool calls may have externally observable side effects
> (sending
> > > > > > emails, writing to external systems, etc.). This means we need
> > > failure
> > > > > > recovery for the in-flight computation state during task
> execution,
> > > so
> > > > > that
> > > > > > subagent execution preserves exactly-once semantics across
> failovers.
> > > > > >
> > > > > >
> > > > > > To wrap up, let me reiterate my support for the overall
> direction of
> > > > this
> > > > > > FLIP. There are clearly technical details still to be worked out,
> > > but I
> > > > > > believe the direction described by this umbrella is both
> necessary
> > > and
> > > > > > important for the long-term evolution of the Flink community.
> Looking
> > > > > > forward to continued discussion in the sub-FLIPs ahead.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Xintong
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, May 8, 2026 at 3:43 PM Guowei Ma <[email protected]>
> > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Thanks everyone for the in-depth discussion over the past few
> days.
> > > > Let
> > > > > > me
> > > > > > > first summarize my understanding of the discussion so far. I
> see
> > > > strong
> > > > > > > interest in making Flink better support AI-oriented data
> processing
> > > > > > > workloads, especially multimodal and inference-oriented
> pipelines.
> > > At
> > > > > the
> > > > > > > same time, a recurring concern is that RpcOperator and some of
> the
> > > > > Layer
> > > > > > 3
> > > > > > > runtime improvements also look like general Flink
> capabilities, so
> > > > why
> > > > > > > should they be discussed together under the AI-Native /
> multimodal
> > > > > > > processing umbrella?
> > > > > > >
> > > > > > > I think the key question is not whether these mechanisms can
> only
> > > be
> > > > > used
> > > > > > > for AI scenarios, but whether they jointly form the runtime
> > > contract
> > > > > > Flink
> > > > > > > needs to support AI-oriented data processing workloads,
> especially
> > > > > > > multimodal inference pipelines. Capabilities such as
> RpcOperator,
> > > > > > > checkpointing, scaling, and resource management can certainly
> serve
> > > > > > broader
> > > > > > > use cases. However, under this class of workloads, they require
> > > more
> > > > > > > consistent runtime semantics and coordinated design, and
> together
> > > > > > determine
> > > > > > > whether the system can provide production-grade execution,
> > > recovery,
> > > > > and
> > > > > > > elasticity.
> > > > > > >
> > > > > > > Compared with traditional streaming workloads, this class of
> > > > workloads
> > > > > > > changes the data shape, computation pattern, and resource model
> > > > > > > significantly. Data is no longer only small structured
> records; it
> > > > may
> > > > > be
> > > > > > > images, video, audio, tensors, embeddings, or object
> references to
> > > > > > external
> > > > > > > large objects. Many pipelines also have a relatively
> shuffle-light
> > > > > shape,
> > > > > > > such as URI → preprocessing → inference → sink. The computation
> > > logic
> > > > > > often
> > > > > > > includes model inference or service-style invocation, either as
> > > > remote
> > > > > > > inference service invocation or local GPU / accelerator-backed
> > > > > execution.
> > > > > > > Therefore, the system needs to handle not only ordinary RPC
> calls,
> > > > but
> > > > > > also
> > > > > > > service discovery, backpressure propagation, batching /
> concurrency
> > > > > > > control, timeout / retry, in-flight request draining, model
> > > loading,
> > > > > GPU
> > > > > > > warmup, resource scheduling, and fault recovery.
> > > > > > >
> > > > > > > From this perspective, RpcOperator is not just “another way to
> call
> > > > an
> > > > > > > external service,” nor is it merely a deployment mechanism for
> GPU
> > > > > > > operators. More importantly, it defines a service-style
> operator
> > > > > > > abstraction: when inference becomes part of the logical data
> flow,
> > > > > Flink
> > > > > > > needs to understand and coordinate these runtime semantics,
> rather
> > > > than
> > > > > > > hiding inference completely behind external black-box calls
> inside
> > > > user
> > > > > > > code.
> > > > > > >
> > > > > > > Some of the Layer 3 runtime improvements follow the same logic.
> > > While
> > > > > > > mechanisms such as checkpointing or scaling are not exclusive
> to AI
> > > > > > > workloads, inference-oriented workloads fundamentally change
> their
> > > > > > > operational assumptions and cost model, making runtime
> behavior far
> > > > > more
> > > > > > > critical than in traditional data processing systems. When
> > > per-record
> > > > > > > computation is expensive, GPU warmup and model loading are
> costly,
> > > > and
> > > > > > the
> > > > > > > execution environment may involve elastic / preemptible
> resources,
> > > > the
> > > > > > cost
> > > > > > > of global rollback or disruptive scaling becomes much higher
> than
> > > in
> > > > > > > traditional row-at-a-time BI / ETL workloads. As a result,
> runtime
> > > > > > behavior
> > > > > > > that may have been acceptable for traditional workloads can
> > > directly
> > > > > > affect
> > > > > > > stability and resource efficiency in inference-oriented
> workloads.
> > > > > > >
> > > > > > > At the same time, the umbrella proposal helps provide a shared
> > > > context
> > > > > > for
> > > > > > > discussing how these capabilities relate to each other and what
> > > > common
> > > > > > > runtime assumptions they rely on. The more important value of
> the
> > > > > > umbrella
> > > > > > > is to align on the workload model, design principles,
> boundaries,
> > > and
> > > > > > > dependencies between capabilities, so that independently
> evolving
> > > > > pieces
> > > > > > > such as RpcOperator, GPU resource declaration, batching /
> > > concurrency
> > > > > > > control, non-disruptive scaling, and regional checkpointing do
> not
> > > > end
> > > > > up
> > > > > > > with inconsistent runtime semantics.
> > > > > > >
> > > > > > > Based on this discussion, I will update the proposal to make
> the
> > > > > workload
> > > > > > > model, RpcOperator boundary, and Layer 3 dependency
> relationship
> > > > > clearer.
> > > > > > >
> > > > > > > Best,
> > > > > > > Guowei
> > > > > > > Best,
> > > > > > > Guowei
> > > > > > >
> > > > > > >
> > > > > > > On Fri, May 8, 2026 at 12:37 PM zhangjiaogg <
> [email protected]>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Guowei and all,
> > > > > > > >
> > > > > > > > Thank you for driving this initiative. Strong +1 on the
> overall
> > > > > > > direction.
> > > > > > > >
> > > > > > > > From our perspective, the core value of this proposal lies
> in two
> > > > > > areas:
> > > > > > > > extending Flink's intelligent processing capabilities for
> > > > multimodal
> > > > > > > data,
> > > > > > > > and enabling native, in-pipeline local inference. As AI
> > > > capabilities
> > > > > > > > continue to advance, multimodal data is accounting for an
> > > > > increasingly
> > > > > > > > large share of overall data volume, and the ability to
> perform
> > > > > > > intelligent,
> > > > > > > > real-time processing on this data — not just ingestion or
> > > routing,
> > > > > but
> > > > > > > > actual inference and transformation within the stream — is
> > > > becoming a
> > > > > > > > critical requirement across industries. Today, most pipelines
> > > treat
> > > > > > > > multimodal objects as opaque blobs and push inference to
> external
> > > > > > > systems,
> > > > > > > > which works but at the cost of complexity, latency, and
> > > > consistency.
> > > > > > > >
> > > > > > > > This is exactly the pain one of our customers experiences in
> > > their
> > > > > > > > autonomous driving data pipelines. Video and image data
> captured
> > > by
> > > > > > > onboard
> > > > > > > > cameras must go through annotation, frame extraction, quality
> > > > > > filtering,
> > > > > > > > and both unstructured and structured data transformation
> before
> > > it
> > > > > can
> > > > > > be
> > > > > > > > used for model training — a workflow that today requires
> > > combining
> > > > > > > multiple
> > > > > > > > specialized systems: a stream engine for structured
> processing, a
> > > > > > > separate
> > > > > > > > framework (e.g., Ray Data) for multimodal processing, and an
> > > > external
> > > > > > > model
> > > > > > > > serving layer for inference. Each system boundary introduces
> > > > > > intermediate
> > > > > > > > storage, operational overhead, and data consistency
> challenges
> > > > across
> > > > > > the
> > > > > > > > full pipeline. If Flink can handle this end-to-end — with
> native
> > > > > > > multimodal
> > > > > > > > types, local GPU inference operators, and unified
> checkpointing —
> > > > the
> > > > > > > > entire workflow becomes a single Flink job, intermediate
> storage
> > > is
> > > > > > > > eliminated, and fault recovery covers the pipeline as a
> whole.
> > > > > > > >
> > > > > > > > We look forward to the sub-FLIP discussions and would be
> happy to
> > > > > > > > contribute.
> > > > > > > >
> > > > > > > > Best regards
> > > > > > > > Jiao Zhang
> > > > > > > >
> > > > > > > > At 2026-05-07 12:58:15, "zihao chen" <[email protected]>
> > > > wrote:
> > > > > > > > >Hi, all,
> > > > > > > > >
> > > > > > > > >I’d like to share some thoughts based on our internal
> experience
> > > > > > > > >with AI workloads on Flink.
> > > > > > > > >
> > > > > > > > >At Tencent, we have production scenarios where Flink is
> used in
> > > > > > > > >AI-related pipelines.
> > > > > > > > >
> > > > > > > > >Based on these workloads, we explored elasticity and
> autoscaling
> > > > > > > > >for cloud-native stream processing systems and published our
> > > > > > > > >experience in SIGMOD 2025:
> > > > > > > > >
> > > > > > > > >"Oceanus: Enable SLO-Aware Vertical Autoscaling for
> > > > > > > > >Cloud-Native Streaming Services in Tencent" [1]
> > > > > > > > >
> > > > > > > > >As our workloads evolved, we also started to see increasing
> > > > > > > > >GPU-based training and inference scenarios.
> > > > > > > > >
> > > > > > > > >Our current solution integrates Flink with external GPU
> > > services.
> > > > > > > > >While this works functionally, it also introduces several
> > > > practical
> > > > > > > > >issues, such as:
> > > > > > > > >
> > > > > > > > >   -
> > > > > > > > >
> > > > > > > > >   fragmented lifecycle management
> > > > > > > > >   -
> > > > > > > > >
> > > > > > > > >   operational complexity
> > > > > > > > >   -
> > > > > > > > >
> > > > > > > > >   inconsistent scaling/recovery behavior across systems
> > > > > > > > >
> > > > > > > > >From this perspective, I think FLIP-577 is exploring a very
> > > > > > > > >meaningful direction.
> > > > > > > > >
> > > > > > > > >In particular, I agree with the idea of integrating
> GPU-backed
> > > > > > > > >computation more naturally into Flink’s runtime model,
> instead
> > > of
> > > > > > > > >treating it purely as an external service integration
> problem.
> > > > > > > > >
> > > > > > > > >Besides, from the elasticity perspective, our experience is
> that
> > > > > > > > >GPU workloads have very different characteristics compared
> with
> > > > > > > > >traditional CPU workloads:
> > > > > > > > >
> > > > > > > > >   -
> > > > > > > > >
> > > > > > > > >   GPU resources are expensive and scarce
> > > > > > > > >   -
> > > > > > > > >
> > > > > > > > >   Startup and replay costs are significantly higher
> > > > > > > > >   -
> > > > > > > > >
> > > > > > > > >   Long-running tasks make scaling and recovery more
> challenging
> > > > > > > > >
> > > > > > > > >In our experience, GPU elasticity cannot simply reuse the
> > > > > > > > >assumptions behind traditional CPU elasticity.
> > > > > > > > >
> > > > > > > > >Because of this, elasticity becomes especially important for
> > > > > > > > >production AI workloads, not only for resource efficiency,
> but
> > > > also
> > > > > > > > >for reducing scaling and recovery overhead.
> > > > > > > > >
> > > > > > > > >More broadly, AI workloads increasingly require Flink to
> > > > collaborate
> > > > > > > > >more naturally with GPU-backed computation, and I believe
> > > > > > > > >FLIP-577 is exploring an important direction toward
> addressing
> > > > > > > > >this gap.
> > > > > > > > >
> > > > > > > > >Overall, I’m looking forward to further discussions about
> this
> > > > FLIP.
> > > > > > > > >
> > > > > > > > >[1] https://dl.acm.org/doi/abs/10.1145/3722212.3724445
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >Best,
> > > > > > > > >Zihao Chen
> > > > > > > > >
> > > > > > > > >Yong Fang <[email protected]> 于2026年5月7日周四 11:11写道：
> > > > > > > > >
> > > > > > > > >> Hi devs,
> > > > > > > > >>
> > > > > > > > >> Thanks Guowei for initiating this proposal. I think this
> is an
> > > > > > > important
> > > > > > > > >> step for Flink towards the era of AI data processing,
> very big
> > > > +1.
> > > > > > > > >>
> > > > > > > > >> I’d like to share some scenarios and requirements of
> > > leveraging
> > > > > > > PyFlink
> > > > > > > > for
> > > > > > > > >> AI data processing at ByteDance. Currently, we run tens of
> > > > > thousands
> > > > > > > of
> > > > > > > > >> pyflink/flink jobs, using millions of CPU cores.
> > > > > > > > >>
> > > > > > > > >> 1) Multimodal Data Processing
> > > > > > > > >> We want to use PyFlink to generate multimodal feature
> data.
> > > The
> > > > > > > typical
> > > > > > > > >> workflow starts by reading ID-based raw data and
> performing
> > > > large
> > > > > > > table
> > > > > > > > >> joins and ETL computations. We then fetch multimodal
> assets
> > > such
> > > > > as
> > > > > > > > images,
> > > > > > > > >> videos, texts and audios from object storage by ID. These
> > > > > multimodal
> > > > > > > > data
> > > > > > > > >> are either sent to an RPC service (backed by local models
> or
> > > > > remote
> > > > > > > > large
> > > > > > > > >> models), or processed via local GPU computing for frame
> > > > > extraction,
> > > > > > > > >> embedding generation and other tasks. After multimodal
> > > > > computation,
> > > > > > > > output
> > > > > > > > >> results including embeddings, processed images and
> multimodal
> > > > > > metadata
> > > > > > > > are
> > > > > > > > >> generated and persisted into the downstream multimodal
> data
> > > > lake.
> > > > > > > > >>
> > > > > > > > >> 2) Stream-Batch Unified Data Training
> > > > > > > > >> We use PyFlink to consume processed sample data from MQ or
> > > data
> > > > > > lakes.
> > > > > > > > >> Within the data pipeline, data may be shuffled by key,
> then
> > > fed
> > > > > into
> > > > > > > > >> parameter servers or local services for CPU-based or
> GPU-based
> > > > > model
> > > > > > > > >> training. Such workloads strongly demand optimized CPU &
> GPU
> > > > > hybrid
> > > > > > > > >> scheduling, worker node restart capability, fast scaling,
> as
> > > > well
> > > > > as
> > > > > > > > native
> > > > > > > > >> support for unified streaming and batch training.
> > > > > > > > >>
> > > > > > > > >> While supporting the above two categories of workloads, we
> > > have
> > > > > > > several
> > > > > > > > >> common requirements for PyFlink and Flink Core:
> > > > > > > > >>
> > > > > > > > >> 1) Native Python Computing Capability
> > > > > > > > >> We need more user-friendly DataFrame APIs, comprehensive
> > > > built-in
> > > > > > data
> > > > > > > > >> types for image and audio, as well as richer multimodal
> > > > computing
> > > > > > > > >> operators. This enables users to develop multimodal data
> > > > > processing
> > > > > > > jobs
> > > > > > > > >> more efficiently, and allows better optimization and
> > > scheduling
> > > > at
> > > > > > the
> > > > > > > > >> execution plan layer.
> > > > > > > > >>
> > > > > > > > >> 2) CPU & GPU Scheduling and Resource Management
> > > > > > > > >> It is expected to tag resource requirements at
> fine-grained
> > > > levels
> > > > > > > such
> > > > > > > > as
> > > > > > > > >> python user-defined functions. The scheduler should
> provide
> > > > > enhanced
> > > > > > > > >> resource orchestration and task scheduling, enabling more
> > > > flexible
> > > > > > > > >> heterogeneous resource management.
> > > > > > > > >>
> > > > > > > > >> 3) Embedded Server Node Capability in Pipeline
> > > > > > > > >> We hope to launch dedicated server nodes inside a single
> > > > pipeline,
> > > > > > to
> > > > > > > > load
> > > > > > > > >> local models or access remote models, which may be shared
> > > > between
> > > > > > > > different
> > > > > > > > >> operators in the pipeline. This unifies data ETL and
> > > multimodal
> > > > > > > > computing
> > > > > > > > >> within one end-to-end pipeline, greatly simplifying
> operation
> > > > and
> > > > > > > > >> maintenance for business teams.
> > > > > > > > >>
> > > > > > > > >> 4) Performance and Stability Optimization
> > > > > > > > >> Key enhancements include zero-copy data transfer between
> Flink
> > > > TM
> > > > > > > > processes
> > > > > > > > >> and Python processes, fast scaling of compute nodes, fast
> > > > > > > checkpointing
> > > > > > > > >> mechanism, and shuffle optimization for large-scale
> datasets.
> > > > > These
> > > > > > > > >> improvements will significantly boost the performance and
> > > > > stability
> > > > > > of
> > > > > > > > >> PyFlink multimodal workloads.
> > > > > > > > >>
> > > > > > > > >> I’m really excited to see that FLIP-577 has covered all
> the
> > > real
> > > > > > > > production
> > > > > > > > >> scenarios and requirements mentioned above. It's a good
> chance
> > > > to
> > > > > > > > iterate
> > > > > > > > >> and enrich the core capabilities of PyFlink and Flink Core
> > > > > targeting
> > > > > > > > these
> > > > > > > > >> AI data processing scenarios, and build Flink into a
> > > first-class
> > > > > AI
> > > > > > > data
> > > > > > > > >> processing engine.
> > > > > > > > >>
> > > > > > > > >> I’m very much looking forward to the progress.
> > > > > > > > >>
> > > > > > > > >> Best,
> > > > > > > > >> FangYong
> > > > > > > > >>
> > > > > > > > >> On Wed, May 6, 2026 at 8:36 PM FeatZhang <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> > Hi all,
> > > > > > > > >> >
> > > > > > > > >> > This is a great topic, and honestly long overdue.
> > > > > > > > >> >
> > > > > > > > >> > With the rapid growth of AI applications, we have been
> > > seeing
> > > > a
> > > > > > > > >> > significant increase in real-world demands from users
> who
> > > are
> > > > > > > already
> > > > > > > > >> > building on Flink and other traditional data processing
> or
> > > BI
> > > > > > > engines.
> > > > > > > > >> > From a platform perspective, more and more teams are
> trying
> > > to
> > > > > > > > >> > integrate AI capabilities directly into their existing
> > > > streaming
> > > > > > > > >> > pipelines, rather than treating them as separate
> systems.
> > > > > > > > >> >
> > > > > > > > >> > This is not an isolated trend — it is becoming a common
> > > > > > requirement
> > > > > > > > >> > across industries.
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 1. What is changing in real systems
> > > > > > > > >> >
> > > > > > > > >> > We are observing a consistent shift:
> > > > > > > > >> >
> > > > > > > > >> > AI is moving from offline analysis or request-time
> scoring
> > > to
> > > > > > > > >> > continuous, event-driven decision making.
> > > > > > > > >> >
> > > > > > > > >> > In other words:
> > > > > > > > >> >
> > > > > > > > >> > AI is becoming part of the data stream itself.
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 2. Representative production scenarios
> > > > > > > > >> >
> > > > > > > > >> > 2.1 Real-time fraud detection (per-event decision under
> > > strict
> > > > > > > > latency)
> > > > > > > > >> >
> > > > > > > > >> > Typical setup
> > > > > > > > >> >
> > > > > > > > >> > Continuous transaction stream (payments, logins,
> transfers)
> > > > > > > > >> > Each event must be evaluated within milliseconds
> > > > > > > > >> > Decision depends on:
> > > > > > > > >> >
> > > > > > > > >> > recent user behavior
> > > > > > > > >> > device / IP patterns
> > > > > > > > >> > short-term aggregates
> > > > > > > > >> >
> > > > > > > > >> > What is happening in practice
> > > > > > > > >> >
> > > > > > > > >> > Models are already deeply integrated into decision flow
> > > > > > > > >> > Feature freshness directly impacts detection accuracy
> > > > > > > > >> >
> > > > > > > > >> > Current pain
> > > > > > > > >> >
> > > > > > > > >> > Features computed in streaming, but inference is remote
> > > > > > > > >> > Network overhead adds to critical path latency
> > > > > > > > >> > Hard to ensure training/serving consistency
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 2.2 Real-time recommendation and ads (continuous
> re-ranking)
> > > > > > > > >> >
> > > > > > > > >> > Typical setup
> > > > > > > > >> >
> > > > > > > > >> > User interaction stream (click, view, dwell time)
> > > > > > > > >> > Continuous feature updates (session + short-term
> behavior)
> > > > > > > > >> > Inference triggered per interaction
> > > > > > > > >> >
> > > > > > > > >> > What is happening in practice
> > > > > > > > >> >
> > > > > > > > >> > Increasing reliance on real-time context
> > > > > > > > >> > Model-based ranking becomes core logic
> > > > > > > > >> >
> > > > > > > > >> > Current pain
> > > > > > > > >> >
> > > > > > > > >> > Offline and online feature pipelines diverge
> > > > > > > > >> > Training-serving skew is common
> > > > > > > > >> > Inference orchestration is ad-hoc
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 2.3 Streaming RAG / knowledge systems (continuous
> indexing)
> > > > > > > > >> >
> > > > > > > > >> > Typical setup
> > > > > > > > >> >
> > > > > > > > >> > Continuous ingestion of documents, logs, or knowledge
> > > > > > > > >> > Pipeline:
> > > > > > > > >> >
> > > > > > > > >> > chunking → embedding → indexing → retrieval
> > > > > > > > >> >
> > > > > > > > >> > Typical use cases
> > > > > > > > >> >
> > > > > > > > >> > AI copilots
> > > > > > > > >> > enterprise knowledge assistants
> > > > > > > > >> > observability systems
> > > > > > > > >> >
> > > > > > > > >> > Current pain
> > > > > > > > >> >
> > > > > > > > >> > Built via loosely coupled services or scripts
> > > > > > > > >> > No strong consistency guarantees
> > > > > > > > >> > Difficult to scale and recover
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 2.4 Real-time feedback loop (continuous evaluation)
> > > > > > > > >> >
> > > > > > > > >> > Typical setup
> > > > > > > > >> >
> > > > > > > > >> > Prediction at time T
> > > > > > > > >> > Label arrives at T + Δ
> > > > > > > > >> >
> > > > > > > > >> > Required processing
> > > > > > > > >> >
> > > > > > > > >> > prediction stream JOIN label stream → metrics →
> optimization
> > > > > > > > >> >
> > > > > > > > >> > Current pain
> > > > > > > > >> >
> > > > > > > > >> > Alignment logic duplicated across systems
> > > > > > > > >> > Late data handling is complex
> > > > > > > > >> > No reusable evaluation abstraction
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 2.5 AI replacing rule-based decision logic
> > > > > > > > >> >
> > > > > > > > >> > Evolution
> > > > > > > > >> >
> > > > > > > > >> > From:
> > > > > > > > >> >
> > > > > > > > >> > rule engine / CEP
> > > > > > > > >> >
> > > > > > > > >> > To:
> > > > > > > > >> >
> > > > > > > > >> > model / LLM → decision
> > > > > > > > >> >
> > > > > > > > >> > Implication
> > > > > > > > >> >
> > > > > > > > >> > AI is becoming the core decision layer inside streaming
> > > > systems.
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 3. Architectural shift
> > > > > > > > >> >
> > > > > > > > >> > Across all scenarios:
> > > > > > > > >> >
> > > > > > > > >> > From:
> > > > > > > > >> >
> > > > > > > > >> > data processing → feature system → model serving →
> > > evaluation
> > > > > > > > >> >
> > > > > > > > >> > To:
> > > > > > > > >> >
> > > > > > > > >> > stream = feature + inference + decision + feedback
> > > > > > > > >> >
> > > > > > > > >> > This reflects a fundamental change in system boundaries.
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 4. Why externalized architectures break down
> > > > > > > > >> >
> > > > > > > > >> > Most current implementations rely on multiple systems:
> > > > > > > > >> >
> > > > > > > > >> > stream processing
> > > > > > > > >> > feature store
> > > > > > > > >> > model serving
> > > > > > > > >> > vector database
> > > > > > > > >> >
> > > > > > > > >> > This introduces several fundamental issues in real-time
> > > > > scenarios.
> > > > > > > > >> >
> > > > > > > > >> > 4.1 Latency dominated by system boundaries
> > > > > > > > >> >
> > > > > > > > >> > stream → network → model service → response
> > > > > > > > >> >
> > > > > > > > >> > network overhead is unavoidable
> > > > > > > > >> > batching is not controlled by the stream runtime
> > > > > > > > >> > no end-to-end backpressure
> > > > > > > > >> >
> > > > > > > > >> > Latency becomes a system-level artifact rather than a
> > > compute
> > > > > > > > property.
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 4.2 Inconsistent data between training and serving
> > > > > > > > >> >
> > > > > > > > >> > offline vs online features
> > > > > > > > >> > different definitions or time windows
> > > > > > > > >> >
> > > > > > > > >> > Models operate on inconsistent data distributions.
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 4.3 State fragmentation
> > > > > > > > >> >
> > > > > > > > >> > user/session context must be rebuilt or fetched
> > > > > > > > >> > loss of data locality
> > > > > > > > >> > processing becomes call-driven
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 4.4 Feedback loop is not composable
> > > > > > > > >> >
> > > > > > > > >> > difficult alignment of prediction and label streams
> > > > > > > > >> > no unified handling of late data
> > > > > > > > >> > duplicated evaluation logic
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 4.5 Operational complexity
> > > > > > > > >> >
> > > > > > > > >> > multiple systems to scale
> > > > > > > > >> > multiple failure domains
> > > > > > > > >> > complex debugging paths
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 5. Why this aligns with Flink
> > > > > > > > >> >
> > > > > > > > >> > These workloads require:
> > > > > > > > >> >
> > > > > > > > >> > event-driven execution
> > > > > > > > >> > strong state management
> > > > > > > > >> > precise time semantics
> > > > > > > > >> > continuous feedback
> > > > > > > > >> >
> > > > > > > > >> > These are exactly Flink’s core strengths.
> > > > > > > > >> >
> > > > > > > > >> > The key insight is:
> > > > > > > > >> >
> > > > > > > > >> > Inference should be modeled as a dataflow operator, not
> an
> > > > > > external
> > > > > > > > >> > service.
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 6. Implication
> > > > > > > > >> >
> > > > > > > > >> > If we model:
> > > > > > > > >> >
> > > > > > > > >> > data → feature → inference → decision → feedback
> > > > > > > > >> >
> > > > > > > > >> > within Flink, we can achieve:
> > > > > > > > >> >
> > > > > > > > >> > unified scheduling
> > > > > > > > >> > shared state
> > > > > > > > >> > consistent time semantics
> > > > > > > > >> > end-to-end fault tolerance
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > 7. Conclusion
> > > > > > > > >> >
> > > > > > > > >> > This is not simply about adding AI support to Flink.
> > > > > > > > >> >
> > > > > > > > >> > It is about recognizing that:
> > > > > > > > >> >
> > > > > > > > >> > Real-time AI systems are fundamentally streaming
> systems.
> > > > > > > > >> >
> > > > > > > > >> > The question is whether Flink evolves to support this
> > > > natively,
> > > > > or
> > > > > > > > >> > remains a preprocessing layer in front of external AI
> > > stacks.
> > > > > > > > >> >
> > > > > > > > >> > ________________________________
> > > > > > > > >> >
> > > > > > > > >> > Happy to follow up with a more concrete proposal (e.g.,
> > > > > inference
> > > > > > > > >> > operator abstraction) if there is interest.
> > > > > > > > >> >
> > > > > > > > >> > Thanks.
> > > > > > > > >> >
> > > > > > > > >> > On Mon, May 4, 2026 at 10:31 PM Gen Luo <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > Hi all,
> > > > > > > > >> > >
> > > > > > > > >> > > Thank you Guowei Ma for driving this discussion, and
> > > thanks
> > > > > > > everyone
> > > > > > > > >> for
> > > > > > > > >> > > the valuable insights. Inspired by this exchange, I’d
> like
> > > > to
> > > > > > > share
> > > > > > > > a
> > > > > > > > >> few
> > > > > > > > >> > > thoughts.
> > > > > > > > >> > >
> > > > > > > > >> > > While “AI-Native” covers broad ground, I believe this
> FLIP
> > > > > does
> > > > > > > not
> > > > > > > > >> > > overextend Flink’s scope. It’s a necessary iteration
> > > driven
> > > > by
> > > > > > > > evolving
> > > > > > > > >> > > user scenarios and AI advancements, particularly
> > > multimodal
> > > > > > > > processing.
> > > > > > > > >> > > Given the growing adoption of multimodal applications
> and
> > > > > > > increasing
> > > > > > > > >> > > interest in low-latency inference, initiating these
> > > > > enhancements
> > > > > > > is
> > > > > > > > a
> > > > > > > > >> > > timely step to better align Flink with evolving AI
> > > > workloads.
> > > > > > > > >> > >
> > > > > > > > >> > > From our engagements with customers and developers, we
> > > > > observe a
> > > > > > > > clear
> > > > > > > > >> > > shift in both workloads and user expectations. Model
> > > > inference
> > > > > > is
> > > > > > > > >> > > increasingly central to data pipelines, with
> multimodal AI
> > > > > tasks
> > > > > > > > >> growing
> > > > > > > > >> > > rapidly. Traditional real-time scenarios (e.g.,
> monitoring
> > > > and
> > > > > > > > >> analytics)
> > > > > > > > >> > > now leverage models and agent frameworks like Flink
> Agent
> > > > for
> > > > > > > > >> > intelligent,
> > > > > > > > >> > > multi-turn decision-making, while large-scale offline
> > > > compute
> > > > > is
> > > > > > > > also
> > > > > > > > >> > > shifting toward LLMs and vision models. Alongside this
> > > > > workload
> > > > > > > > >> > evolution,
> > > > > > > > >> > > developer workflows have adapted: AI practitioners
> > > naturally
> > > > > > > prefer
> > > > > > > > >> > Python
> > > > > > > > >> > > and DataFrame-style APIs. As AI-assisted coding
> matures,
> > > > > > aligning
> > > > > > > > >> system
> > > > > > > > >> > > interfaces with these familiar patterns will directly
> > > > improve
> > > > > > > > >> > AI-generated
> > > > > > > > >> > > code quality and significantly lower adoption
> barriers for
> > > > the
> > > > > > AI
> > > > > > > > >> > community.
> > > > > > > > >> > >
> > > > > > > > >> > > Today, many AI evaluation tools don’t yet recommend
> Flink
> > > > for
> > > > > AI
> > > > > > > > >> > > workloads—largely due to limited visibility of Flink’s
> > > > > relevant
> > > > > > > > >> > > capabilities rather than fundamental incompatibility.
> In
> > > > > > reality,
> > > > > > > > Flink
> > > > > > > > >> > has
> > > > > > > > >> > > unique strengths here. For example, generating
> multimodal
> > > > > > samples
> > > > > > > is
> > > > > > > > >> > often
> > > > > > > > >> > > a multi-day, GPU-heavy process. Flink’s streaming
> model,
> > > > > > combined
> > > > > > > > with
> > > > > > > > >> > > checkpointing and reduced disk I/O, is well-suited for
> > > such
> > > > > > > > >> long-running
> > > > > > > > >> > > tasks—a direction also pursued by engines like Daft
> and
> > > Ray
> > > > > > Data.
> > > > > > > > With
> > > > > > > > >> > > Flink’s proven production stability, we’re
> well-positioned
> > > > for
> > > > > > > both
> > > > > > > > >> batch
> > > > > > > > >> > > and future real-time multimodal streaming inference.
> > > > Targeted
> > > > > > > > >> > improvements
> > > > > > > > >> > > can make these advantages visible, driving better user
> > > > > > experiences
> > > > > > > > and
> > > > > > > > >> > > healthier ecosystem growth.
> > > > > > > > >> > >
> > > > > > > > >> > > I’d also note a lesson from FlinkML. It attempted to
> cover
> > > > > model
> > > > > > > > >> training
> > > > > > > > >> > > but struggled to align with the fast-iteration,
> > > > > > > > Python/notebook-centric
> > > > > > > > >> > > workflows preferred by ML researchers. Flink’s core
> > > strength
> > > > > > lies
> > > > > > > in
> > > > > > > > >> > > high-concurrency, production-grade inference
> > > > orchestration—not
> > > > > > > > training
> > > > > > > > >> > > lifecycle management (e.g., experiment tracking,
> > > > versioning).
> > > > > > This
> > > > > > > > >> > mismatch
> > > > > > > > >> > > limited its adoption.
> > > > > > > > >> > >
> > > > > > > > >> > > This proposal, however, takes a different path. It
> doesn’t
> > > > aim
> > > > > > to
> > > > > > > > >> replace
> > > > > > > > >> > > training frameworks. Instead, it introduces modern AI
> > > > concepts
> > > > > > > > >> > (multimodal
> > > > > > > > >> > > data, LLMs) as first-class citizens for inference,
> built
> > > > atop
> > > > > > > > Flink’s
> > > > > > > > >> > > computation strengths. Think Ray Data’s scope (plus
> simple
> > > > > > > > co-located
> > > > > > > > >> > > serving), not Train/Tune. Crucially, unlike the
> FlinkML
> > > era,
> > > > > > > today’s
> > > > > > > > >> > models
> > > > > > > > >> > > use standardized interfaces and mature serving
> frameworks,
> > > > > > > allowing
> > > > > > > > >> Flink
> > > > > > > > >> > > to integrate external models seamlessly without heavy
> > > > > > > > >> > > customization—significantly lowering project risk.
> > > > > > > > >> > >
> > > > > > > > >> > > This FLIP marks Flink’s another starting point for
> the AI
> > > > era.
> > > > > > > While
> > > > > > > > >> > > details need refinement, I believe this direction
> aligns
> > > > with
> > > > > > both
> > > > > > > > >> > current
> > > > > > > > >> > > and future user needs and Flink’s evolution.
> > > > > > > > >> > >
> > > > > > > > >> > > Best,
> > > > > > > > >> > > Gen
> > > > > > > > >> > >
> > > > > > > > >> > > On Mon, May 4, 2026 at 12:15 PM Jark Wu <
> [email protected]
> > > >
> > > > > > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > > Hi Guowei,
> > > > > > > > >> > > >
> > > > > > > > >> > > > Thanks for driving this. +1 on the overall
> direction.
> > > > > Flink's
> > > > > > > > >> > > > streaming processing and checkpoint mechanism give
> it a
> > > > > > > structural
> > > > > > > > >> > > > advantage over systems like Daft and Ray. But today,
> > > these
> > > > > > > runtime
> > > > > > > > >> > > > strengths are held back by gaps in Python API, GPU
> > > > > scheduling,
> > > > > > > and
> > > > > > > > >> > > > native multimodal data handling. This umbrella FLIP
> > > > > addresses
> > > > > > > > exactly
> > > > > > > > >> > > > that gap, comprehensively and systematically. I
> believe
> > > > > > > multimodal
> > > > > > > > >> > > > data processing is the biggest opportunity for
> > > traditional
> > > > > > data
> > > > > > > > infra
> > > > > > > > >> > > > to transition into AI infra, and this is one of the
> most
> > > > > > > important
> > > > > > > > >> > > > FLIPs for Flink in the AI era.
> > > > > > > > >> > > >
> > > > > > > > >> > > > As one of the Table/SQL module maintainers, we would
> > > like
> > > > to
> > > > > > > > >> > > > contribute the built-in multimodal processing UDFs
> > > (audio,
> > > > > > > video,
> > > > > > > > >> > > > image, text) and native multimodal data types
> (Tensor,
> > > > > Image,
> > > > > > > > >> > > > Embedding, etc.) as first-class citizens in the type
> > > > system.
> > > > > > > > Looking
> > > > > > > > >> > > > forward to the sub-FLIP discussions.
> > > > > > > > >> > > >
> > > > > > > > >> > > > Best,
> > > > > > > > >> > > > Jark
> > > > > > > > >> > > >
> > > > > > > > >> > > > On Thu, 30 Apr 2026 at 18:42, Guowei Ma <
> > > > > [email protected]
> > > > > > >
> > > > > > > > >> wrote:
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Hi,Yaroslav
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Thanks for taking the time to write this detailed
> > > > > feedback.
> > > > > > > Let
> > > > > > > > me
> > > > > > > > >> > > > clarify
> > > > > > > > >> > > > > the intent of the proposal first.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > I am not saying that Flink should become an AI
> > > > framework,
> > > > > an
> > > > > > > ML
> > > > > > > > >> > platform,
> > > > > > > > >> > > > > or a model serving system. The way I use
> "AI-Native"
> > > in
> > > > > this
> > > > > > > > >> > proposal is
> > > > > > > > >> > > > to
> > > > > > > > >> > > > > say that Flink should support, as first-class
> > > citizens,
> > > > > the
> > > > > > > core
> > > > > > > > >> > objects
> > > > > > > > >> > > > > and execution patterns that frequently show up in
> > > > > > AI-oriented
> > > > > > > > data
> > > > > > > > >> > > > > processing — instead of leaving them entirely to
> > > > external
> > > > > > > > systems
> > > > > > > > >> or
> > > > > > > > >> > ad
> > > > > > > > >> > > > hoc
> > > > > > > > >> > > > > user-defined integrations.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > These objects and execution patterns include:
> > > > > > > > >> > > > >
> > > > > > > > >> > > > >    - Multimodal and unstructured data objects
> such as
> > > > > > images,
> > > > > > > > >> video,
> > > > > > > > >> > > > audio,
> > > > > > > > >> > > > >    tensors, embeddings, and object references.
> > > > > > > > >> > > > >    - Model inference as part of the data flow,
> rather
> > > > than
> > > > > > an
> > > > > > > > >> > entirely
> > > > > > > > >> > > > >    external black-box service call.
> > > > > > > > >> > > > >    - Operators backed by heterogeneous resources
> such
> > > as
> > > > > > GPUs.
> > > > > > > > >> > > > >    - Pythonic and vectorized processing styles.
> > > > > > > > >> > > > >    - Long-running, long-tailed asynchronous
> > > computation.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > "AI-Native" is just a shorthand here, meaning that
> > > Flink
> > > > > > > should
> > > > > > > > >> > natively
> > > > > > > > >> > > > > understand and support the core abstractions of
> this
> > > > class
> > > > > > of
> > > > > > > > >> > workloads.
> > > > > > > > >> > > > > The FLIP needs to make the target workload class
> > > > clearer.
> > > > > > What
> > > > > > > > we
> > > > > > > > >> > care
> > > > > > > > >> > > > > about is not any specific model paradigm — LLM,
> CV,
> > > > > > > > recommendation,
> > > > > > > > >> > or
> > > > > > > > >> > > > > traditional ML inference — but a class of data
> > > > processing
> > > > > > > > workloads
> > > > > > > > >> > with
> > > > > > > > >> > > > > shared runtime and topology characteristics:
> > > > > > > > >> > > > >
> > > > > > > > >> > > > >    - A single computation may take seconds or even
> > > > > minutes,
> > > > > > > > instead
> > > > > > > > >> > of
> > > > > > > > >> > > > >    microseconds as in traditional row-at-a-time
> > > > > processing.
> > > > > > > > >> > > > >    - Execution often involves heterogeneous
> resources
> > > > such
> > > > > > as
> > > > > > > > CPU +
> > > > > > > > >> > GPU,
> > > > > > > > >> > > > >    where GPUs are expensive and scarce.
> > > > > > > > >> > > > >    - Data is often multimodal large objects
> (images,
> > > > > video,
> > > > > > > > audio,
> > > > > > > > >> > > > tensors,
> > > > > > > > >> > > > >    embeddings), rather than structured small
> records.
> > > > > > > > >> > > > >    - Computation logic often includes model
> inference
> > > or
> > > > > > > > >> > service-style
> > > > > > > > >> > > > >    invocations as part of the pipeline.
> > > > > > > > >> > > > >    - Many target topologies are relatively
> > > shuffle-light
> > > > > and
> > > > > > > > don't
> > > > > > > > >> > > > >    necessarily involve complex keyed-state
> migration,
> > > > e.g.
> > > > > > > URI →
> > > > > > > > >> > > > preprocessing
> > > > > > > > >> > > > >    → inference → sink.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Ten years ago, many ML workloads took the form of
> > > > offline
> > > > > > > > training
> > > > > > > > >> > plus
> > > > > > > > >> > > > > online feature serving. Flink already played a
> strong
> > > > role
> > > > > > in
> > > > > > > > >> feature
> > > > > > > > >> > > > > engineering, streaming feature computation, and
> > > > real-time
> > > > > > data
> > > > > > > > >> > > > preparation,
> > > > > > > > >> > > > > so there was no strong need to reshape Flink into
> an
> > > > > > > "ML-Native"
> > > > > > > > >> > engine.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > What is changing today is that model inference
> itself
> > > is
> > > > > > > > >> increasingly
> > > > > > > > >> > > > > becoming part of the data processing pipeline;
> > > > multimodal
> > > > > > > > objects
> > > > > > > > >> > are no
> > > > > > > > >> > > > > longer just opaque blobs in external storage, but
> data
> > > > > > objects
> > > > > > > > that
> > > > > > > > >> > need
> > > > > > > > >> > > > to
> > > > > > > > >> > > > > be referenced, passed, transformed, inferred
> over, and
> > > > > > landed
> > > > > > > > >> inside
> > > > > > > > >> > the
> > > > > > > > >> > > > > engine. This is not simply one more ML use case —
> it
> > > is
> > > > a
> > > > > > > > change in
> > > > > > > > >> > the
> > > > > > > > >> > > > > shape of workloads Flink needs to support.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > On whether the user demand is real, the validation
> > > > signals
> > > > > > we
> > > > > > > > are
> > > > > > > > >> > > > currently
> > > > > > > > >> > > > > seeing include:
> > > > > > > > >> > > > >
> > > > > > > > >> > > > >    - Within Alibaba, multimodal data processing is
> > > > already
> > > > > > in
> > > > > > > > >> > production,
> > > > > > > > >> > > > >    covering image, video, audio, and text
> modalities.
> > > > > > > > >> > > > >    - In offline conversations with several
> companies
> > > > > > > (including
> > > > > > > > >> > ByteDance
> > > > > > > > >> > > > >    and Tencent), we have heard substantial demand
> for
> > > > > Flink
> > > > > > to
> > > > > > > > >> > support
> > > > > > > > >> > > > AI data
> > > > > > > > >> > > > >    processing / multimodal data processing.
> > > > > > > > >> > > > >    - On the ecosystem side, we are working with
> NVIDIA
> > > > on
> > > > > a
> > > > > > > > joint
> > > > > > > > >> > demo
> > > > > > > > >> > > > >    focused on multimodal data processing, planned
> for
> > > > > Flink
> > > > > > > > Forward
> > > > > > > > >> > Asia.
> > > > > > > > >> > > > >    - The emergence and growth of systems such as
> Daft,
> > > > Ray
> > > > > > > Data,
> > > > > > > > >> > > > >    Data-Juicer, and LAS also reflect rapidly
> growing
> > > > > demand
> > > > > > > for
> > > > > > > > >> > > > multimodal
> > > > > > > > >> > > > >    data processing.
> > > > > > > > >> > > > >    - There have also been independent discussions
> in
> > > > this
> > > > > > > > direction
> > > > > > > > >> > > > within
> > > > > > > > >> > > > >    the community — for example, the
> "Streaming-native
> > > AI
> > > > > > > > Inference
> > > > > > > > >> > > > Runtime
> > > > > > > > >> > > > >    Layer" proposal on the dev list.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > On "why now, instead of waiting for
> standardization"
> > > — I
> > > > > > > > understand
> > > > > > > > >> > the
> > > > > > > > >> > > > > concern. LLM-related frameworks, APIs, and
> > > > > application-level
> > > > > > > > >> > patterns are
> > > > > > > > >> > > > > indeed changing quickly. If this FLIP were trying
> to
> > > > bake
> > > > > a
> > > > > > > > >> specific
> > > > > > > > >> > LLM
> > > > > > > > >> > > > > API, agent framework, or prompt protocol into
> Flink,
> > > the
> > > > > > risk
> > > > > > > > would
> > > > > > > > >> > be
> > > > > > > > >> > > > high.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > But most of the capabilities in this proposal are
> not
> > > > > > > > LLM-specific.
> > > > > > > > >> > They
> > > > > > > > >> > > > > are more fundamental data processing and runtime
> > > > > > capabilities:
> > > > > > > > >> > Pipeline
> > > > > > > > >> > > > > Region-level checkpointing, Object Reference, GPU
> > > > resource
> > > > > > > > >> > declaration,
> > > > > > > > >> > > > > columnar data transfer, service-style operator
> > > > invocation,
> > > > > > > > >> > long-running
> > > > > > > > >> > > > > async execution. These are useful for today's LLM
> > > > > workloads,
> > > > > > > and
> > > > > > > > >> > equally
> > > > > > > > >> > > > > useful for future AI workloads in shapes we cannot
> > > fully
> > > > > > > predict
> > > > > > > > >> > yet. The
> > > > > > > > >> > > > > fast-changing parts should live in the ecosystem
> and
> > > SDK
> > > > > > > layer;
> > > > > > > > the
> > > > > > > > >> > FLIP
> > > > > > > > >> > > > > should focus on more stable engine-level
> capabilities.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > On tactical changes vs. umbrella, I partly agree
> with
> > > > you.
> > > > > > > Each
> > > > > > > > >> > sub-FLIP
> > > > > > > > >> > > > > should be discussed, reviewed, and accepted or
> > > rejected
> > > > on
> > > > > > its
> > > > > > > > own
> > > > > > > > >> > > > merits.
> > > > > > > > >> > > > > The umbrella should not bypass the normal FLIP
> > > process,
> > > > > and
> > > > > > > > >> > accepting the
> > > > > > > > >> > > > > umbrella does not mean accepting all sub-FLIPs.
> That
> > > > > said, I
> > > > > > > > still
> > > > > > > > >> > think
> > > > > > > > >> > > > > the umbrella is valuable. Its purpose is not to
> bind
> > > the
> > > > > 11
> > > > > > > > changes
> > > > > > > > >> > into
> > > > > > > > >> > > > a
> > > > > > > > >> > > > > single inseparable package, but to help the
> community
> > > > > align
> > > > > > on
> > > > > > > > >> > > > principles,
> > > > > > > > >> > > > > clarify boundaries and dependencies, and avoid
> > > > conflicting
> > > > > > or
> > > > > > > > >> > duplicated
> > > > > > > > >> > > > > abstractions across related capabilities.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > For example, if RpcOperator is not considered
> together
> > > > > with
> > > > > > > > >> > > > non-disruptive
> > > > > > > > >> > > > > scaling, it is hard to give GPU operator
> elasticity
> > > > > coherent
> > > > > > > > >> > semantics.
> > > > > > > > >> > > > > Deploying inference services independently is
> only the
> > > > > first
> > > > > > > > step;
> > > > > > > > >> > the
> > > > > > > > >> > > > > harder question is how Flink uniformly handles
> service
> > > > > > > > discovery,
> > > > > > > > >> > > > in-flight
> > > > > > > > >> > > > > request draining, backpressure, and failover
> during
> > > > > scaling.
> > > > > > > > >> Without
> > > > > > > > >> > an
> > > > > > > > >> > > > > umbrella, these capabilities can certainly be
> advanced
> > > > as
> > > > > > > > tactical
> > > > > > > > >> > > > changes,
> > > > > > > > >> > > > > but we may end up with a set of abstractions that
> are
> > > > > > locally
> > > > > > > > >> usable
> > > > > > > > >> > but
> > > > > > > > >> > > > > globally inconsistent.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > On RpcOperator, I agree that we need to be very
> > > careful
> > > > in
> > > > > > > > defining
> > > > > > > > >> > the
> > > > > > > > >> > > > > boundary between the Flink runtime and external
> > > > > > orchestration
> > > > > > > > >> > systems.
> > > > > > > > >> > > > > Kubernetes or the Kubernetes Operator may well be
> the
> > > > > right
> > > > > > > > choice
> > > > > > > > >> > at the
> > > > > > > > >> > > > > physical deployment level. But I still believe
> Flink
> > > > > needs a
> > > > > > > > >> > first-class
> > > > > > > > >> > > > > RpcOperator abstraction, because deployment is
> only
> > > part
> > > > > of
> > > > > > > the
> > > > > > > > >> > problem —
> > > > > > > > >> > > > > the harder part is its semantic integration with
> the
> > > > Flink
> > > > > > > job.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > If model inference is part of the logical data
> flow,
> > > > Flink
> > > > > > > > needs at
> > > > > > > > >> > > > minimum
> > > > > > > > >> > > > > to be aware of its service discovery, backpressure
> > > > > behavior,
> > > > > > > > >> failover
> > > > > > > > >> > > > > behavior, in-flight request draining, and scaling
> > > > > > > coordination.
> > > > > > > > If
> > > > > > > > >> > it is
> > > > > > > > >> > > > > hidden entirely behind an external black-box
> service,
> > > it
> > > > > is
> > > > > > > hard
> > > > > > > > >> for
> > > > > > > > >> > > > Flink
> > > > > > > > >> > > > > to provide consistent job-level semantics and
> > > > operational
> > > > > > > > >> experience.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > So the point of RpcOperator is not necessarily
> that
> > > > "every
> > > > > > > > physical
> > > > > > > > >> > > > process
> > > > > > > > >> > > > > must be directly launched and managed by Flink
> core,"
> > > > but
> > > > > > that
> > > > > > > > >> Flink
> > > > > > > > >> > > > needs
> > > > > > > > >> > > > > to define a service-style operator contract that
> > > allows
> > > > > such
> > > > > > > > >> > operators to
> > > > > > > > >> > > > > be invoked correctly by the data flow, coordinated
> > > > > correctly
> > > > > > > by
> > > > > > > > the
> > > > > > > > >> > > > > runtime, and understood and operated by users as
> part
> > > > of a
> > > > > > > Flink
> > > > > > > > >> job.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > On vectorized batch processing, I agree the
> long-term
> > > > > > > direction
> > > > > > > > >> > should
> > > > > > > > >> > > > not
> > > > > > > > >> > > > > stop at Python. Native columnar / vectorized
> execution
> > > > is
> > > > > an
> > > > > > > > >> > end-to-end
> > > > > > > > >> > > > > problem that touches connectors, formats, the type
> > > > system,
> > > > > > > > runtime,
> > > > > > > > >> > Java,
> > > > > > > > >> > > > > SQL, and Python. The current proposal starts from
> the
> > > > > > > > Java/Python
> > > > > > > > >> > > > boundary
> > > > > > > > >> > > > > because that is where the row/column conversion
> > > overhead
> > > > > is
> > > > > > > most
> > > > > > > > >> > visible.
> > > > > > > > >> > > > > End-to-end columnar execution on the Java and SQL
> side
> > > > > > > deserves
> > > > > > > > to
> > > > > > > > >> be
> > > > > > > > >> > > > > discussed further as a separate, larger FLIP.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > On multimodal types and SerDes complexity, I agree
> > > this
> > > > > > needs
> > > > > > > > to be
> > > > > > > > >> > > > handled
> > > > > > > > >> > > > > carefully. Making AI-related objects first-class
> does
> > > > not
> > > > > > > imply
> > > > > > > > >> that
> > > > > > > > >> > > > every
> > > > > > > > >> > > > > connector must immediately and fully support
> image,
> > > > video,
> > > > > > > > audio,
> > > > > > > > >> > tensor,
> > > > > > > > >> > > > > and so on. The concrete incremental path, fallback
> > > > > strategy,
> > > > > > > and
> > > > > > > > >> the
> > > > > > > > >> > > > > boundary between formats, connector API, and the
> type
> > > > > system
> > > > > > > > will
> > > > > > > > >> be
> > > > > > > > >> > > > > discussed further in the corresponding sub-FLIPs.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Coming back to the core of the proposal: it is not
> > > about
> > > > > > > turning
> > > > > > > > >> > Flink
> > > > > > > > >> > > > into
> > > > > > > > >> > > > > an AI framework. It is about making the core
> objects
> > > and
> > > > > > > > execution
> > > > > > > > >> > > > patterns
> > > > > > > > >> > > > > of AI-oriented data processing first-class
> citizens in
> > > > > > Flink.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Best,
> > > > > > > > >> > > > > Guowei
> > > > > > > > >> > > > >
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > On Thu, Apr 30, 2026 at 5:37 AM Yaroslav
> Tkachenko <
> > > > > > > > >> > [email protected]
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > wrote:
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > > Hi Guowei,
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Thank you for writing this proposal.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > I may be in the minority here, but I hope my
> voice
> > > > will
> > > > > be
> > > > > > > > >> heard. I
> > > > > > > > >> > > > > > disagree with turning Flink into an "AI-Native"
> > > > engine.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Regarding your "Data processing is entering the
> AI
> > > > era,
> > > > > > and
> > > > > > > > Flink
> > > > > > > > >> > > > needs to
> > > > > > > > >> > > > > > evolve from a traditional BI compute engine
> into a
> > > > data
> > > > > > > engine
> > > > > > > > >> that
> > > > > > > > >> > > > > > natively supports AI workloads" claim:
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > - How exactly do you define "AI"? I don't
> believe
> > > > there
> > > > > > is a
> > > > > > > > >> > standard
> > > > > > > > >> > > > > > definition. For example, Machine Learning have
> been
> > > > > around
> > > > > > > for
> > > > > > > > >> more
> > > > > > > > >> > > > than a
> > > > > > > > >> > > > > > decade, but there were no proposals (or need,
> in my
> > > > > > opinion)
> > > > > > > > to
> > > > > > > > >> > turn
> > > > > > > > >> > > > Flink
> > > > > > > > >> > > > > > into an "ML-Native" engine. Flink, in its
> current
> > > > state,
> > > > > > has
> > > > > > > > >> > > > > > been successfully used in many systems alongside
> > > > > dedicated
> > > > > > > ML
> > > > > > > > >> > > > technologies,
> > > > > > > > >> > > > > > like feature stores. Based on the context of
> your
> > > > > > proposal,
> > > > > > > it
> > > > > > > > >> > looks
> > > > > > > > >> > > > like
> > > > > > > > >> > > > > > you mostly mean LLMs, so could you be specific
> about
> > > > the
> > > > > > > > >> language?
> > > > > > > > >> > > > > > - I wouldn't call Flink "a traditional BI
> compute
> > > > > engine".
> > > > > > > > Flink
> > > > > > > > >> > is a
> > > > > > > > >> > > > > > general data processing technology which can be
> used
> > > > > for a
> > > > > > > > >> variety
> > > > > > > > >> > of
> > > > > > > > >> > > > use
> > > > > > > > >> > > > > > cases without any BI involvement.
> > > > > > > > >> > > > > > - Do you have any proof that "Users' core
> workloads
> > > > are
> > > > > > > > rapidly
> > > > > > > > >> > > > evolving"
> > > > > > > > >> > > > > > and that they require your proposed changes?
> Case
> > > > > studies,
> > > > > > > > user
> > > > > > > > >> > > > surveys, or
> > > > > > > > >> > > > > > submitted issues about the lack of support? Big
> > > > changes
> > > > > > like
> > > > > > > > that
> > > > > > > > >> > > > require
> > > > > > > > >> > > > > > extensive validation.
> > > > > > > > >> > > > > > - And even if there is a real need to adopt some
> > > > > > LLM-driven
> > > > > > > > >> > changes,
> > > > > > > > >> > > > why
> > > > > > > > >> > > > > > now? The LLM-related tooling has been changing
> so
> > > > > rapidly,
> > > > > > > and
> > > > > > > > >> it's
> > > > > > > > >> > > > hard to
> > > > > > > > >> > > > > > predict what will be needed tomorrow. Why does
> it
> > > make
> > > > > > sense
> > > > > > > > to
> > > > > > > > >> > > > introduce
> > > > > > > > >> > > > > > changes now, and not wait for more
> standardization
> > > and
> > > > > > > > >> > consolidation?
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > To summarize, I think there are a lot of great
> ideas
> > > > in
> > > > > > the
> > > > > > > > >> > proposal,
> > > > > > > > >> > > > but
> > > > > > > > >> > > > > > in my mind, they need to be addressed as
> tactical,
> > > > > focused
> > > > > > > > >> > changes, not
> > > > > > > > >> > > > > > under the "AI-Native" umbrella.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > I also wanted to address a few more specific
> points:
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > - RpcOperator, why does it need to be managed by
> > > > Flink?
> > > > > I
> > > > > > > see
> > > > > > > > >> > > > absolutely no
> > > > > > > > >> > > > > > need to introduce the additional complexity of
> > > > > > orchestrating
> > > > > > > > >> > standalone
> > > > > > > > >> > > > > > components into the core Flink engine. I can
> > > imagine a
> > > > > > > > separate
> > > > > > > > >> > > > sub-project
> > > > > > > > >> > > > > > for an RpcOperator, which could potentially be
> > > managed
> > > > > by
> > > > > > > the
> > > > > > > > >> > > > Kubernetes
> > > > > > > > >> > > > > > Operator.
> > > > > > > > >> > > > > > - You make the case for the vectorized batch
> > > > processing,
> > > > > > but
> > > > > > > > only
> > > > > > > > >> > on
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > > Python side. Why stop there? Native columnar
> > > > vectorized
> > > > > > > > execution
> > > > > > > > >> > will
> > > > > > > > >> > > > > > require end-to-end changes, including
> connectors,
> > > data
> > > > > > > format
> > > > > > > > >> > support,
> > > > > > > > >> > > > Type
> > > > > > > > >> > > > > > system support, runtime changes, etc. It seems
> > > logical
> > > > > to
> > > > > > me
> > > > > > > > to
> > > > > > > > >> > support
> > > > > > > > >> > > > > > this execution mode for Java and SQL as well.
> > > > > > > > >> > > > > > - Supporting many more data types natively
> (images,
> > > > > video,
> > > > > > > > audio,
> > > > > > > > >> > > > tensors)
> > > > > > > > >> > > > > > will make connector serializers and
> deserializers
> > > > > (SerDes)
> > > > > > > > much
> > > > > > > > >> > more
> > > > > > > > >> > > > > > challenging to implement. Even today, many
> SerDes in
> > > > > > > > officially
> > > > > > > > >> > > > supported
> > > > > > > > >> > > > > > connectors don't fully implement types like
> arrays
> > > and
> > > > > > > > structs.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Thank you.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > On Wed, Apr 29, 2026 at 1:18 AM Guowei Ma <
> > > > > > > > [email protected]>
> > > > > > > > >> > > > wrote:
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > > Hi Z
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > Thanks for the kind words and the thoughtful
> > > > > questions.
> > > > > > > Let
> > > > > > > > me
> > > > > > > > >> > take
> > > > > > > > >> > > > them
> > > > > > > > >> > > > > > > one by one.
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > >    1. Throughput and latency targets
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > To be honest, I don't have concrete numbers to
> > > share
> > > > > > yet.
> > > > > > > > What
> > > > > > > > >> I
> > > > > > > > >> > can
> > > > > > > > >> > > > say
> > > > > > > > >> > > > > > is
> > > > > > > > >> > > > > > > that our internal testing has already surfaced
> > > > several
> > > > > > > > >> directions
> > > > > > > > >> > > > where
> > > > > > > > >> > > > > > > Flink can be improved, and at the same time we
> > > want
> > > > to
> > > > > > > fully
> > > > > > > > >> > leverage
> > > > > > > > >> > > > > > > Flink's existing streaming shuffle
> capabilities.
> > > As
> > > > > the
> > > > > > > > >> > multimodal
> > > > > > > > >> > > > > > operator
> > > > > > > > >> > > > > > > library matures, we'll progressively publish
> > > > benchmark
> > > > > > > > results.
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > >    2. Built-in operators
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > You're absolutely right. From what I've seen,
> our
> > > > > > internal
> > > > > > > > >> users
> > > > > > > > >> > > > already
> > > > > > > > >> > > > > > > rely on a fairly large set of multimodal
> > > operators —
> > > > > > > > >> potentially
> > > > > > > > >> > > > 100+.
> > > > > > > > >> > > > > > The
> > > > > > > > >> > > > > > > exact set the community should provide is best
> > > > > discussed
> > > > > > > in
> > > > > > > > >> > FLIP-XXX:
> > > > > > > > >> > > > > > > Built-in Multimodal Operators and AI
> Functions,
> > > and
> > > > > > > > >> contributions
> > > > > > > > >> > > > from
> > > > > > > > >> > > > > > the
> > > > > > > > >> > > > > > > community are very welcome there.
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > >    3. Plan for the 11 sub-FLIPs
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > The sequencing follows the layering in the
> > > umbrella:
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > >    - Layer 1 (Core Primitives) should be
> discussed
> > > > and
> > > > > > > > aligned
> > > > > > > > >> > first,
> > > > > > > > >> > > > > > since
> > > > > > > > >> > > > > > >    the second and third layers build on it.
> > > > > > > > >> > > > > > >    - Layer 2 (API + compilation + single-node
> > > > > execution)
> > > > > > > > starts
> > > > > > > > >> > with
> > > > > > > > >> > > > > > >    getting the API discussion right — the
> Python
> > > > API,
> > > > > > how
> > > > > > > > UDFs
> > > > > > > > >> > > > declare
> > > > > > > > >> > > > > > >    resources, etc. — after which the
> single-node
> > > > > > execution
> > > > > > > > work
> > > > > > > > >> > can
> > > > > > > > >> > > > build
> > > > > > > > >> > > > > > > on
> > > > > > > > >> > > > > > >    top.
> > > > > > > > >> > > > > > >    - Layer 3 (distributed scheduling and
> > > > > checkpointing)
> > > > > > > can
> > > > > > > > >> > largely
> > > > > > > > >> > > > > > proceed
> > > > > > > > >> > > > > > >    independently in parallel.
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > So while each sub-FLIP is indeed a substantial
> > > piece
> > > > > of
> > > > > > > > work,
> > > > > > > > >> > most of
> > > > > > > > >> > > > > > them
> > > > > > > > >> > > > > > > can be advanced in parallel by different
> > > > contributors
> > > > > > once
> > > > > > > > the
> > > > > > > > >> > Layer
> > > > > > > > >> > > > 1
> > > > > > > > >> > > > > > > primitives are settled.
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > >    4. GPU scheduling roadmap
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > Could you expand a bit on which aspect of GPU
> > > > > scheduling
> > > > > > > you
> > > > > > > > >> > have in
> > > > > > > > >> > > > mind
> > > > > > > > >> > > > > > > as the complex one? "GPU scheduling" covers a
> > > fairly
> > > > > > wide
> > > > > > > > >> surface
> > > > > > > > >> > > > area
> > > > > > > > >> > > > > > > (resource declaration, operator-level
> deployment,
> > > > > > elastic
> > > > > > > > >> > scaling,
> > > > > > > > >> > > > > > > heterogeneous GPU types, fine-grained
> > > partitioning,
> > > > > > etc.),
> > > > > > > > and
> > > > > > > > >> > the
> > > > > > > > >> > > > answer
> > > > > > > > >> > > > > > > differs quite a bit depending on which
> dimension
> > > > we're
> > > > > > > > >> > discussing.
> > > > > > > > >> > > > Once I
> > > > > > > > >> > > > > > > understand your specific concern I can give a
> more
> > > > > > useful
> > > > > > > > >> > response.
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > Thanks again for the support — looking
> forward to
> > > > the
> > > > > > > > continued
> > > > > > > > >> > > > > > discussion.
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > Best,
> > > > > > > > >> > > > > > > Guowei
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > On Tue, Apr 28, 2026 at 4:34 PM zl z <
> > > > > > > [email protected]
> > > > > > > > >
> > > > > > > > >> > wrote:
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > > Hey Guowei,
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > Thanks for the proposal, and I think this is
> > > very
> > > > > > > > valuable. I
> > > > > > > > >> > have
> > > > > > > > >> > > > some
> > > > > > > > >> > > > > > > > question about it:
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > 1. What are our expected throughput and
> latency
> > > > > > targets?
> > > > > > > > Do
> > > > > > > > >> we
> > > > > > > > >> > > > have any
> > > > > > > > >> > > > > > > > forward-looking tests for this?
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > 2. AI involves a very large number of
> operators.
> > > > > > Besides
> > > > > > > > >> > allowing
> > > > > > > > >> > > > users
> > > > > > > > >> > > > > > > to
> > > > > > > > >> > > > > > > > use them through UDFs, will we also provide
> > > > commonly
> > > > > > > used
> > > > > > > > >> > built-in
> > > > > > > > >> > > > > > > > operators?
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > 3. Each of the 11 sub-FLIPs is a major
> project
> > > > > > > involving a
> > > > > > > > >> > > > significant
> > > > > > > > >> > > > > > > > amount of changes. What is our plan for
> this?
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > 4. GPU scheduling is extremely complex.
> What is
> > > > our
> > > > > > > > current
> > > > > > > > >> > > > roadmap for
> > > > > > > > >> > > > > > > > this?
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > This is a very high-quality and exciting
> > > proposal.
> > > > > > > Making
> > > > > > > > >> > Flink an
> > > > > > > > >> > > > > > > > AI-native data processing engine will make
> it
> > > far
> > > > > more
> > > > > > > > >> > valuable in
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > > AI
> > > > > > > > >> > > > > > > > era. Look forward to seeing it land and
> come to
> > > > > > fruition
> > > > > > > > >> soon.
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > Robert Metzger <[email protected]>
> > > > 于2026年4月28日周二
> > > > > > > > 14:38写道：
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > > Hey Guowei,
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > Thanks for the proposal. I just took a
> brief
> > > > look,
> > > > > > > here
> > > > > > > > are
> > > > > > > > >> > some
> > > > > > > > >> > > > high
> > > > > > > > >> > > > > > > > level
> > > > > > > > >> > > > > > > > > questions:
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > Regarding the RPC Operator: What is the
> > > > difference
> > > > > > to
> > > > > > > > the
> > > > > > > > >> > async
> > > > > > > > >> > > > io
> > > > > > > > >> > > > > > > > operator
> > > > > > > > >> > > > > > > > > we have already?
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > "Connector API for Multimodal Data
> > > Source/Sink":
> > > > > Why
> > > > > > > do
> > > > > > > > we
> > > > > > > > >> > need
> > > > > > > > >> > > > to
> > > > > > > > >> > > > > > > touch
> > > > > > > > >> > > > > > > > > the connector API for supporting
> multimodal
> > > > data?
> > > > > > > Isn't
> > > > > > > > >> this
> > > > > > > > >> > > > more of
> > > > > > > > >> > > > > > a
> > > > > > > > >> > > > > > > > > formats concern?
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > "Non-Disruptive Scaling for CPU
> Operators":
> > > How
> > > > do
> > > > > > you
> > > > > > > > want
> > > > > > > > >> > to
> > > > > > > > >> > > > > > > guarantee
> > > > > > > > >> > > > > > > > > exactly-once on that kind of scaling?
> E.g. you
> > > > > need
> > > > > > to
> > > > > > > > >> > somehow
> > > > > > > > >> > > > make a
> > > > > > > > >> > > > > > > > > handover between the old and new new
> pipeline
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > Overall, I find the proposal has some
> things
> > > > which
> > > > > > > seem
> > > > > > > > >> > related
> > > > > > > > >> > > > to
> > > > > > > > >> > > > > > > making
> > > > > > > > >> > > > > > > > > Flink more AI native, but other changes
> seem
> > > > > > > orthogonal
> > > > > > > > to
> > > > > > > > >> > that.
> > > > > > > > >> > > > For
> > > > > > > > >> > > > > > > > > example the checkpoint or scaling changes
> are
> > > > > > actually
> > > > > > > > >> > unrelated
> > > > > > > > >> > > > to
> > > > > > > > >> > > > > > AI,
> > > > > > > > >> > > > > > > > and
> > > > > > > > >> > > > > > > > > just engine improvements.
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > On Tue, Apr 28, 2026 at 5:48 AM Guowei Ma
> <
> > > > > > > > >> > [email protected]>
> > > > > > > > >> > > > > > > wrote:
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > > Hi everyone,
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > I'd like to start a discussion on an
> > > umbrella
> > > > > > > FLIP[1]
> > > > > > > > >> that
> > > > > > > > >> > lays
> > > > > > > > >> > > > > > out a
> > > > > > > > >> > > > > > > > > > direction for evolving Flink into a data
> > > > engine
> > > > > > that
> > > > > > > > >> > natively
> > > > > > > > >> > > > > > > supports
> > > > > > > > >> > > > > > > > AI
> > > > > > > > >> > > > > > > > > > workloads.
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > The short version: user workloads are
> > > shifting
> > > > > > from
> > > > > > > BI
> > > > > > > > >> > > > analytics to
> > > > > > > > >> > > > > > > > > > multimodal data processing centered on
> model
> > > > > > > > inference,
> > > > > > > > >> and
> > > > > > > > >> > > > this
> > > > > > > > >> > > > > > > > triggers
> > > > > > > > >> > > > > > > > > > cascading changes across the stack —
> > > > multimodal
> > > > > > data
> > > > > > > > >> > flowing
> > > > > > > > >> > > > > > through
> > > > > > > > >> > > > > > > > > > pipelines, heterogeneous CPU/GPU
> resources,
> > > > > > > vectorized
> > > > > > > > >> > > > execution,
> > > > > > > > >> > > > > > and
> > > > > > > > >> > > > > > > > > > inference tasks that run for seconds to
> > > > minutes
> > > > > on
> > > > > > > > Spot
> > > > > > > > >> > > > instances.
> > > > > > > > >> > > > > > > The
> > > > > > > > >> > > > > > > > > > proposal sketches an evolution along
> five
> > > > > > directions
> > > > > > > > >> > > > (development
> > > > > > > > >> > > > > > > > > paradigm,
> > > > > > > > >> > > > > > > > > > data model, heterogeneous resources,
> > > execution
> > > > > > > engine,
> > > > > > > > >> > fault
> > > > > > > > >> > > > > > > > tolerance),
> > > > > > > > >> > > > > > > > > > decomposed into 11 sub-FLIPs organized
> into
> > > > > three
> > > > > > > > layers:
> > > > > > > > >> > core
> > > > > > > > >> > > > > > > runtime
> > > > > > > > >> > > > > > > > > > primitives, AI workload expression and
> > > > > execution,
> > > > > > > and
> > > > > > > > >> > > > > > > production-grade
> > > > > > > > >> > > > > > > > > > operational guarantees. Most sub-FLIPs
> have
> > > no
> > > > > > hard
> > > > > > > > >> > > > dependencies on
> > > > > > > > >> > > > > > > > each
> > > > > > > > >> > > > > > > > > > other and can be advanced in parallel.
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > A note on scope, since it's an umbrella:
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > - In scope here: whether the evolution
> > > > > directions
> > > > > > > are
> > > > > > > > >> > > > reasonable,
> > > > > > > > >> > > > > > > > whether
> > > > > > > > >> > > > > > > > > > each sub-FLIP's motivation and proposed
> > > > approach
> > > > > > are
> > > > > > > > >> > > > well-founded,
> > > > > > > > >> > > > > > > and
> > > > > > > > >> > > > > > > > > > whether the boundaries and dependencies
> > > > between
> > > > > > > > sub-FLIPs
> > > > > > > > >> > are
> > > > > > > > >> > > > > > clear.
> > > > > > > > >> > > > > > > > > > - Out of scope here: detailed designs,
> API
> > > > > > > specifics,
> > > > > > > > and
> > > > > > > > >> > > > > > > > implementation
> > > > > > > > >> > > > > > > > > > plans of individual sub-FLIPs — those
> will
> > > go
> > > > > > > through
> > > > > > > > >> > their own
> > > > > > > > >> > > > > > > FLIPs.
> > > > > > > > >> > > > > > > > > > - Consensus criteria: agreement on the
> > > overall
> > > > > > > > direction
> > > > > > > > >> is
> > > > > > > > >> > > > > > > sufficient
> > > > > > > > >> > > > > > > > > for
> > > > > > > > >> > > > > > > > > > the umbrella to pass; passing it does
> not
> > > lock
> > > > > in
> > > > > > > any
> > > > > > > > >> > > > sub-FLIP's
> > > > > > > > >> > > > > > > > design —
> > > > > > > > >> > > > > > > > > > sub-FLIPs may still be adjusted,
> deferred,
> > > or
> > > > > > > > withdrawn
> > > > > > > > >> as
> > > > > > > > >> > they
> > > > > > > > >> > > > > > > > progress.
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > All proposed changes are incremental —
> no
> > > > > existing
> > > > > > > > API or
> > > > > > > > >> > > > behavior
> > > > > > > > >> > > > > > is
> > > > > > > > >> > > > > > > > > > removed or altered. Compatibility
> details
> > > are
> > > > > > > covered
> > > > > > > > at
> > > > > > > > >> > the
> > > > > > > > >> > > > end of
> > > > > > > > >> > > > > > > the
> > > > > > > > >> > > > > > > > > > document.
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > Looking forward to your feedback on the
> > > > overall
> > > > > > > > direction
> > > > > > > > >> > and
> > > > > > > > >> > > > the
> > > > > > > > >> > > > > > > > > layering.
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > [1]
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > >
> > > > > > > > >> > > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > Thanks,
> > > > > > > > >> > > > > > > > > > Guowei
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > >
> > > > > > > > >> > > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Re: Re: [DISCUSS] FLIP-577: AI-Native Flink — An Umbrella Proposal for Multimodal Data Processing

Reply via email to