Hi Martijn,

Thanks a lot for the thoughtful follow-up, and no worries about the timing.

Your summary captures the intended framing very well: the umbrella is meant
to align the community on the workload class, the overall direction, and
the relationship between the sub-FLIPs, while each concrete design still
needs to be discussed and evaluated on its own merits.

I also agree that the feature matrix concern should remain visible during
the sub-FLIP discussions. For engine-level primitives, we should continue
to be explicit about the intended API coverage, staged rollout, and any
limitations, rather than leaving those implicit.

Thanks again for the careful review. Given the discussion so far, I will
proceed with the vote soon.

Best,
Guowei

On Wed, May 27, 2026 at 8:25 PM Martijn Visser <[email protected]>
wrote:

> Hi Guowei,
>
> Apologies for the late reply. The replies and revisions address
> several of my original concerns. The explicit non-goals, the "Why Now
> / Why Flink" sections, and the conservative defaults on runtime
> mechanisms (opt-in, no behavioral change for existing jobs) all land
> in the right place.
>
> On the umbrella's purpose: I now read it as a directional statement
> that gives the community a shared view on the workload class and how
> the sub-FLIPs relate, with each sub-FLIP discussed and voted on its
> own merits. With that framing, a +1 on the umbrella is a +1 on
> direction, not on any specific design choice. The substantive scrutiny
> lives at sub-FLIP review time, which is where I'd want it anyway.
>
> Brief notes on my four original points:
>
> 1. The "bundling" concern softens once it's clear the umbrella is
> directional. The engine-shaped capabilities (regional checkpoints,
> UAC, non-disruptive scaling) and the AI-shaped capabilities will still
> need to stand on their own at sub-FLIP review.
> 2. The feature matrix concern is still something that I see as
> concerning, but we can have those discussions for each of the
> sub-FLIPs where applicable. I'm happy to see the signals from Jark,
> Dian and Lincoln that there's commitment on that already.
> 3. The Flink ML risk question is addressed in my opinion.
> 4. The evidence question has had the input from others in the
> community, and the updated "Why Now" section. I think we also need to
> be able to experiment and see what's working on this front, but that
> shouldn't be a reason to hold back on the overall idea.
>
> To summarize, I'm comfortable with the vote proceeding. Apologies for
> the delay on my side.
>
> Best regards,
>
> Martijn
>
> Op do 21 mei 2026 om 11:19 schreef Guowei Ma <[email protected]>:
> >
> > Hi Martijn
> >
> > Thanks for the heads-up.
> >
> > That sounds good to me. Since the proposal has been stable for a while,
> > could
> > you share a rough idea of when you expect to finish your final review? A
> > tentative date would be enough, and I can hold off starting the vote
> until
> > then.
> >
> > Best,
> > Guowei
> >
> >
> > On Thu, May 21, 2026 at 4:47 PM Martijn Visser <[email protected]
> >
> > wrote:
> >
> > > Hi Guowei,
> > >
> > > I would like to have a couple of more days to take a closer final look
> > > with a final round of comments, I haven't been able to finalize
> > > everything yet on this subject.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > Op wo 20 mei 2026 om 15:04 schreef Guowei Ma <[email protected]>:
> > > >
> > > > Hi all,
> > > >
> > > > Thanks everyone for the discussion and feedback on the FLIP.
> > > >
> > > > If there are no further comments or concerns, I plan to start the
> vote on
> > > > Thursday.
> > > >
> > > >
> > > > Best,
> > > > Guowei
> > > >
> > > >
> > > > On Fri, May 15, 2026 at 12:21 PM Jiangang Liu <
> [email protected]
> > > >
> > > > wrote:
> > > >
> > > > > I Greatly support this proposal. AI-oriented and multimodal data
> > > processing
> > > > > is already becoming a real production workload, and Flink is well
> > > > > positioned to serve as the dataflow backbone for these pipelines.
> > > > >
> > > > > The proposal correctly focuses on engine-level capabilities rather
> than
> > > > > turning Flink into an AI framework or model serving platform. In
> > > > > particular, the shared runtime contract across multimodal types,
> object
> > > > > references, inference invocation, GPU resource management,
> scaling, and
> > > > > recovery is essential. Without such coordination, users would have
> to
> > > > > stitch together fragmented abstractions themselves. I also
> appreciate
> > > the
> > > > > incremental and backward-compatible approach, as well as the
> > > conservative
> > > > > scope for correctness-sensitive runtime changes.
> > > > >
> > > > > This direction can help Flink extend its existing strengths in
> > > streaming,
> > > > > checkpointing, and production operations to a new class of
> AI-driven
> > > > > workloads.
> > > > >
> > > > > Best
> > > > > Jiangang Liu
> > > > >
> > > > >
> > > > > Lincoln Lee <[email protected]> 于2026年5月11日周一 14:47写道:
> > > > >
> > > > > > Hi Guowei and all,
> > > > > >
> > > > > > Big +1 on the overall direction of FLIP-577.
> > > > > >
> > > > > > From the SQL ecosystem perspective, over the past year the
> community
> > > > > > has introduced AI functions (e.g., ML_PREDICT, VECTOR_SEARCH) to
> help
> > > > > > users leverage Flink in AI scenarios. Through community
> interactions
> > > > > > and enterprise user feedback since then, we can strongly sense
> the
> > > > > > growing expectation for Flink to go further — broader multimodal
> data
> > > > > > processing capabilities, improved runtime stability under AI
> > > > > > workloads, and more optimized execution for inference-heavy
> > > pipelines.
> > > > > >
> > > > > > This FLIP addresses exactly those gaps. In particular, I'm
> excited
> > > about:
> > > > > >
> > > > > >  - Multimodal data type extensions — first-class types like
> > > > > >   Tensor, Image, Embedding, and Audio in the type system will
> unlock
> > > > > >   native SQL/Table expressions over multimodal data, rather than
> > > > > >   forcing users to work around opaque BYTES columns.
> > > > > >  - Object reference mechanism — a critical optimization for
> > > > > >   large-payload multimodal data. Passing multi-megabyte blobs
> through
> > > > > >   the standard data path is a known pain point today; a
> > > > > >   reference-based approach will significantly reduce
> serialization
> > > > > >   overhead and memory pressure.
> > > > > >  - Richer built-in AI functions — standardizing common multimodal
> > > > > >   operations (embedding generation, similarity search, model
> > > > > >   inference) as engine-level functions will eliminate the
> repeated
> > > > > >   ad-hoc UDF implementations we see across organizations today,
> and
> > > > > >   open the door for engine-level optimizations like shared model
> > > > > >   handles and adaptive batching.
> > > > > >
> > > > > > As Jark mentioned, we'd like to participate in and contribute to
> the
> > > > > > relevant sub-FLIPs.
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Lincoln Lee
> > > > > >
> > > > > >
> > > > > > Guowei Ma <[email protected]> 于2026年5月10日周日 21:26写道:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Thanks again for the thoughtful feedback and the valuable
> > > perspectives
> > > > > > > shared in this discussion.
> > > > > > >
> > > > > > > I have updated FLIP-577 [1] based on the discussion in this
> > > thread. The
> > > > > > > overall direction remains the same, but I have tried to make
> the
> > > scope,
> > > > > > > motivation, and boundaries clearer.
> > > > > > >
> > > > > > > The main changes are:
> > > > > > >
> > > > > > >    1.
> > > > > > >
> > > > > > >    Clarified the target workload as AI-oriented data processing
> > > > > > workloads,
> > > > > > >    instead of relying only on the broader "AI-Native" wording.
> > > > > > >    2.
> > > > > > >
> > > > > > >    Added explicit non-goals to make clear that this proposal
> does
> > > not
> > > > > aim
> > > > > > >    to turn Flink into an AI framework, ML platform, or model
> > > serving
> > > > > > > system.
> > > > > > >    3.
> > > > > > >
> > > > > > >    Added "Why Now" and "Why Flink" sections to better explain
> the
> > > > > > >    production signals, ecosystem trends, and why Flink's
> existing
> > > > > runtime
> > > > > > >    strengths are relevant here.
> > > > > > >    4.
> > > > > > >
> > > > > > >    Reworked the umbrella rationale. The key point is not that
> every
> > > > > > >    sub-FLIP is AI-specific, but that these mechanisms need a
> shared
> > > > > > runtime
> > > > > > >    contract across data representation, service invocation, GPU
> > > > > > resources,
> > > > > > >    scaling, and recovery.
> > > > > > >    5.
> > > > > > >
> > > > > > >    Clarified that engine-level primitives should consider
> > > SQL/Table,
> > > > > Java
> > > > > > >    DataStream, and Python DataFrame.
> > > > > > >    6.
> > > > > > >
> > > > > > >    Made the initial correctness scope of runtime mechanisms —
> > > > > > >    non-disruptive scaling, UAC enhancements, and Pipeline
> Region
> > > > > > > checkpoints —
> > > > > > >    more conservative, with explicit opt-in where default
> behavior
> > > is
> > > > > > > affected.
> > > > > > >
> > > > > > > I also tried to reflect the earlier questions raised in this
> > > thread in
> > > > > > the
> > > > > > > corresponding sections of the updated document.
> > > > > > >
> > > > > > > Looking forward to the continued discussion.
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275
> > > > > > >
> > > > > > > Best,
> > > > > > > Guowei
> > > > > > >
> > > > > > >
> > > > > > > On Sun, May 10, 2026 at 4:19 PM Xintong Song <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > First, big thanks to Guowei for kicking off this important
> > > > > discussion,
> > > > > > > and
> > > > > > > > to the community for the substantive engagement over the past
> > > days.
> > > > > I'd
> > > > > > > > also like to share some of my own thoughts.
> > > > > > > >
> > > > > > > > My apologies for joining late — I've been tied up with other
> > > things
> > > > > and
> > > > > > > > only recently started catching up. Given the length of this
> > > thread,
> > > > > > it's
> > > > > > > > possible I haven't fully digested every detail. If I miss
> > > anything,
> > > > > > > please
> > > > > > > > bear with me and feel free to point it out.
> > > > > > > >
> > > > > > > > Let me state my position upfront: a big +1 to this FLIP.
> Based
> > > on the
> > > > > > > > workload evolution I've been observing, I believe moving
> Flink
> > > toward
> > > > > > > > AI-Native is necessary — this is a project- and
> community-level
> > > > > effort
> > > > > > > that
> > > > > > > > matters deeply for how Flink responds to the industry shift
> > > driven by
> > > > > > AI,
> > > > > > > > and how it captures the new opportunities that come with it.
> We
> > > may
> > > > > > still
> > > > > > > > need more discussion to align on specific technical details,
> but
> > > at
> > > > > the
> > > > > > > > strategic level of where the community should head, I
> strongly
> > > > > support
> > > > > > > this
> > > > > > > > umbrella.
> > > > > > > >
> > > > > > > > Below are my thoughts on a few key questions raised in the
> > > discussion
> > > > > > so
> > > > > > > > far.
> > > > > > > >
> > > > > > > >
> > > > > > > > 1. On the scope of the umbrella
> > > > > > > >
> > > > > > > > Robert, Yaroslav, and Martijn have all pointed out that some
> > > items in
> > > > > > the
> > > > > > > > FLIP — Pipeline Region independent checkpointing, UAC
> > > enhancements,
> > > > > > > > non-disruptive scaling, etc. — are not unique to AI
> scenarios and
> > > > > would
> > > > > > > > also apply elsewhere. That observation is fair.
> > > > > > > >
> > > > > > > > However, I'd argue the right criterion for whether something
> > > belongs
> > > > > > > under
> > > > > > > > this umbrella is not whether it is *exclusively* used for
> AI, but
> > > > > > whether
> > > > > > > > it is *essential* to the central goal here — AI-Native Flink
> —
> > > i.e.,
> > > > > > > > whether AI-Native Flink would actually fall short without it.
> > > > > > > >
> > > > > > > > From that angle, I agree with what Guowei has already laid
> out:
> > > under
> > > > > > AI
> > > > > > > > workload characteristics (per-record processing cost orders
> of
> > > > > > magnitude
> > > > > > > > higher, GPU resources expensive and scarce, long and
> long-tailed
> > > > > > > > asynchronous inference), these otherwise-general mechanisms
> get
> > > > > pushed
> > > > > > to
> > > > > > > > the limit of their current implementations. They may be
> adequate
> > > for
> > > > > > > > traditional BI/ETL workloads, but they become real
> bottlenecks
> > > under
> > > > > AI
> > > > > > > > workloads. Without them, the AI-Native Flink story would be
> > > > > incomplete.
> > > > > > > So
> > > > > > > > I support including them under this umbrella.
> > > > > > > >
> > > > > > > >
> > > > > > > > 2. On usage and maintenance complexity
> > > > > > > >
> > > > > > > > I understand the concerns Yaroslav and Martijn raised about
> > > > > complexity
> > > > > > > > costs. That said, I don't think we should conservatively
> reject a
> > > > > > > > capability because of complexity, if that capability is
> critical
> > > and
> > > > > > > brings
> > > > > > > > substantial benefits. On the necessity of RpcOperator
> > > specifically,
> > > > > Zhu
> > > > > > > > Zhu, Guowei, and others have already provided answers
> earlier in
> > > this
> > > > > > > > thread, and I'll add some context from the Flink Agents side
> > > below.
> > > > > > > >
> > > > > > > > As a side observation: with the rapid progress of AI-assisted
> > > > > > development
> > > > > > > > over the past year or two, the way engineers work is
> shifting in
> > > a
> > > > > > > > noticeable way. On one hand, AI coding tools are helping
> > > developers
> > > > > > > > maintain complex software systems at lower effort. On the
> other
> > > hand,
> > > > > > > more
> > > > > > > > and more users are starting to ask AI how to use Flink
> > > correctly, and
> > > > > > > even
> > > > > > > > letting AI help them develop and operate Flink workloads.
> Usage
> > > and
> > > > > > > > maintenance complexity still matter, but their relative
> weight is
> > > > > > > arguably
> > > > > > > > going down — this is a trend worth factoring into long-term
> > > tradeoffs
> > > > > > at
> > > > > > > > the community level.
> > > > > > > >
> > > > > > > >
> > > > > > > > 3. On "wait for the industry to stabilize before integrating"
> > > > > > > >
> > > > > > > > Yaroslav raised a concern about the timing — the LLM tooling
> > > > > landscape
> > > > > > is
> > > > > > > > changing fast, and it's hard to predict what will be needed
> > > tomorrow.
> > > > > > I'd
> > > > > > > > offer a different perspective.
> > > > > > > >
> > > > > > > > It's *precisely because* the industry is evolving fast and
> > > hasn't yet
> > > > > > > > settled into a stable shape that Flink should join earlier —
> to
> > > > > > influence
> > > > > > > > and help define how that shape forms. If we just stand by
> > > waiting for
> > > > > > the
> > > > > > > > landscape to settle, by the time it does, there may not be a
> > > place
> > > > > for
> > > > > > > > Flink in it, and we'd have missed an important window.
> > > > > > > >
> > > > > > > > In fact, projects like Daft and Ray Data are increasingly
> > > defining
> > > > > the
> > > > > > de
> > > > > > > > facto standards in this space, as Leonard pointed out
> earlier in
> > > this
> > > > > > > > thread. If Flink doesn't engage proactively, even keeping up
> > > later
> > > > > will
> > > > > > > put
> > > > > > > > us in a reactive position.
> > > > > > > >
> > > > > > > > Engaging early does carry some cost of trial and error — some
> > > > > > > capabilities
> > > > > > > > may need to be deprecated or restructured down the line. But
> I'd
> > > > > argue
> > > > > > > this
> > > > > > > > is a normal part of how open-source projects evolve, and it's
> > > both
> > > > > > > > necessary and worth it.
> > > > > > > >
> > > > > > > >
> > > > > > > > 4. RpcOperator from a Flink Agents perspective
> > > > > > > >
> > > > > > > > The benefits of RpcOperator have already been articulated by
> > > several
> > > > > > > people
> > > > > > > > earlier in the thread. I'd like to add a concrete scenario
> from
> > > the
> > > > > > Flink
> > > > > > > > Agents side — the subagent pattern.
> > > > > > > >
> > > > > > > >
> > > > > > > > 4.1 Brief intro to the subagent pattern
> > > > > > > >
> > > > > > > > In agent practice, after planning, the main agent often
> spawns
> > > one or
> > > > > > > more
> > > > > > > > subagents to handle specific subtasks. Each subagent has its
> own
> > > LLM
> > > > > > > > context — its own system prompt, its own context window, its
> own
> > > > > > toolset
> > > > > > > > and permissions. The main agent only receives the final
> result
> > > from
> > > > > the
> > > > > > > > subagent, not the intermediate steps. This pattern is now
> widely
> > > > > > adopted
> > > > > > > in
> > > > > > > > the industry, and many agent skills explicitly say "spawn a
> > > subagent
> > > > > to
> > > > > > > do
> > > > > > > > X".
> > > > > > > >
> > > > > > > > Flink Agents wants to support this pattern for several
> reasons:
> > > > > > > > compatibility with the existing agent skill ecosystem;
> context
> > > > > > isolation,
> > > > > > > > so that intermediate state during subagent execution doesn't
> > > pollute
> > > > > > the
> > > > > > > > main agent's context; and one additional advantage that
> > > enterprise
> > > > > > > > production scenarios have over single-machine
> personal-assistant
> > > > > > agents —
> > > > > > > > the ability to dispatch resource-intensive subtasks to a
> shared
> > > > > > resource
> > > > > > > > pool for efficient resource reuse.
> > > > > > > >
> > > > > > > >
> > > > > > > > 4.2 Why this needs RpcOperator instead of in-place execution
> > > inside
> > > > > the
> > > > > > > > Flink Agents operator
> > > > > > > >
> > > > > > > > Why not just spawn subagents directly inside the Flink Agents
> > > > > operator?
> > > > > > > > Because subagent workloads are highly bursty — within the
> same
> > > job,
> > > > > one
> > > > > > > > minute's input might need several subagents for deep
> processing;
> > > the
> > > > > > next
> > > > > > > > minute's input might need none. Under that kind of
> randomness,
> > > > > > statically
> > > > > > > > allocating subagent resources per operator parallelism
> leaves no
> > > good
> > > > > > > > answer: under-allocate and the operator gets blown out the
> moment
> > > > > > demand
> > > > > > > > arrives; over-allocate and resources sit idle most of the
> time.
> > > > > > > >
> > > > > > > > The natural answer is to put subagent execution into a shared
> > > > > resource
> > > > > > > pool
> > > > > > > > that can be used by multiple upstream operator subtasks, and
> > > that can
> > > > > > > scale
> > > > > > > > elastically based on load. This maps directly onto the
> design and
> > > > > core
> > > > > > > > benefits of RpcOperator that Zhu Zhu laid out earlier — each
> > > > > > RpcOperator
> > > > > > > > instance forming its own Pipelined Region, which in turn
> enables
> > > > > > > > independent scaling and flexible load balancing.
> > > > > > > >
> > > > > > > >
> > > > > > > > 4.3 Why this shared resource pool should be inside Flink
> rather
> > > than
> > > > > > > > deployed externally
> > > > > > > >
> > > > > > > > Two reasons.
> > > > > > > >
> > > > > > > > First, the prompts, toolsets, and permission configurations a
> > > > > subagent
> > > > > > > > needs at execution time are essentially part of the Flink
> Agents
> > > *job
> > > > > > > > definition*. Stripping them out of the job and deploying them
> > > > > > separately
> > > > > > > > effectively splits the job definition in two, leaving users
> to
> > > > > > maintain a
> > > > > > > > synchronization relationship between an external deployment
> and
> > > the
> > > > > > Flink
> > > > > > > > job. This goes against the developer experience Flink Agents
> > > aims to
> > > > > > > > provide.
> > > > > > > >
> > > > > > > > Second, subagent execution needs to be incorporated into
> Flink's
> > > > > > > checkpoint
> > > > > > > > mechanism. The notion of "stateful" here deserves a brief
> > > > > > clarification:
> > > > > > > > each subagent task is self-contained — the full context it
> needs
> > > is
> > > > > > > handed
> > > > > > > > over by the main agent at dispatch time in one shot. That is,
> > > > > subagents
> > > > > > > > don't need to share state across records, so at the task
> level
> > > they
> > > > > > look
> > > > > > > > stateless. However, executing a single subagent task takes a
> > > > > > significant
> > > > > > > > amount of time and involves multiple rounds of model calls
> and
> > > tool
> > > > > > > calls,
> > > > > > > > where tool calls may have externally observable side effects
> > > (sending
> > > > > > > > emails, writing to external systems, etc.). This means we
> need
> > > > > failure
> > > > > > > > recovery for the in-flight computation state during task
> > > execution,
> > > > > so
> > > > > > > that
> > > > > > > > subagent execution preserves exactly-once semantics across
> > > failovers.
> > > > > > > >
> > > > > > > >
> > > > > > > > To wrap up, let me reiterate my support for the overall
> > > direction of
> > > > > > this
> > > > > > > > FLIP. There are clearly technical details still to be worked
> out,
> > > > > but I
> > > > > > > > believe the direction described by this umbrella is both
> > > necessary
> > > > > and
> > > > > > > > important for the long-term evolution of the Flink community.
> > > Looking
> > > > > > > > forward to continued discussion in the sub-FLIPs ahead.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > >
> > > > > > > > Xintong
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, May 8, 2026 at 3:43 PM Guowei Ma <
> [email protected]>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > Thanks everyone for the in-depth discussion over the past
> few
> > > days.
> > > > > > Let
> > > > > > > > me
> > > > > > > > > first summarize my understanding of the discussion so far.
> I
> > > see
> > > > > > strong
> > > > > > > > > interest in making Flink better support AI-oriented data
> > > processing
> > > > > > > > > workloads, especially multimodal and inference-oriented
> > > pipelines.
> > > > > At
> > > > > > > the
> > > > > > > > > same time, a recurring concern is that RpcOperator and
> some of
> > > the
> > > > > > > Layer
> > > > > > > > 3
> > > > > > > > > runtime improvements also look like general Flink
> > > capabilities, so
> > > > > > why
> > > > > > > > > should they be discussed together under the AI-Native /
> > > multimodal
> > > > > > > > > processing umbrella?
> > > > > > > > >
> > > > > > > > > I think the key question is not whether these mechanisms
> can
> > > only
> > > > > be
> > > > > > > used
> > > > > > > > > for AI scenarios, but whether they jointly form the runtime
> > > > > contract
> > > > > > > > Flink
> > > > > > > > > needs to support AI-oriented data processing workloads,
> > > especially
> > > > > > > > > multimodal inference pipelines. Capabilities such as
> > > RpcOperator,
> > > > > > > > > checkpointing, scaling, and resource management can
> certainly
> > > serve
> > > > > > > > broader
> > > > > > > > > use cases. However, under this class of workloads, they
> require
> > > > > more
> > > > > > > > > consistent runtime semantics and coordinated design, and
> > > together
> > > > > > > > determine
> > > > > > > > > whether the system can provide production-grade execution,
> > > > > recovery,
> > > > > > > and
> > > > > > > > > elasticity.
> > > > > > > > >
> > > > > > > > > Compared with traditional streaming workloads, this class
> of
> > > > > > workloads
> > > > > > > > > changes the data shape, computation pattern, and resource
> model
> > > > > > > > > significantly. Data is no longer only small structured
> > > records; it
> > > > > > may
> > > > > > > be
> > > > > > > > > images, video, audio, tensors, embeddings, or object
> > > references to
> > > > > > > > external
> > > > > > > > > large objects. Many pipelines also have a relatively
> > > shuffle-light
> > > > > > > shape,
> > > > > > > > > such as URI → preprocessing → inference → sink. The
> computation
> > > > > logic
> > > > > > > > often
> > > > > > > > > includes model inference or service-style invocation,
> either as
> > > > > > remote
> > > > > > > > > inference service invocation or local GPU /
> accelerator-backed
> > > > > > > execution.
> > > > > > > > > Therefore, the system needs to handle not only ordinary RPC
> > > calls,
> > > > > > but
> > > > > > > > also
> > > > > > > > > service discovery, backpressure propagation, batching /
> > > concurrency
> > > > > > > > > control, timeout / retry, in-flight request draining, model
> > > > > loading,
> > > > > > > GPU
> > > > > > > > > warmup, resource scheduling, and fault recovery.
> > > > > > > > >
> > > > > > > > > From this perspective, RpcOperator is not just “another
> way to
> > > call
> > > > > > an
> > > > > > > > > external service,” nor is it merely a deployment mechanism
> for
> > > GPU
> > > > > > > > > operators. More importantly, it defines a service-style
> > > operator
> > > > > > > > > abstraction: when inference becomes part of the logical
> data
> > > flow,
> > > > > > > Flink
> > > > > > > > > needs to understand and coordinate these runtime semantics,
> > > rather
> > > > > > than
> > > > > > > > > hiding inference completely behind external black-box calls
> > > inside
> > > > > > user
> > > > > > > > > code.
> > > > > > > > >
> > > > > > > > > Some of the Layer 3 runtime improvements follow the same
> logic.
> > > > > While
> > > > > > > > > mechanisms such as checkpointing or scaling are not
> exclusive
> > > to AI
> > > > > > > > > workloads, inference-oriented workloads fundamentally
> change
> > > their
> > > > > > > > > operational assumptions and cost model, making runtime
> > > behavior far
> > > > > > > more
> > > > > > > > > critical than in traditional data processing systems. When
> > > > > per-record
> > > > > > > > > computation is expensive, GPU warmup and model loading are
> > > costly,
> > > > > > and
> > > > > > > > the
> > > > > > > > > execution environment may involve elastic / preemptible
> > > resources,
> > > > > > the
> > > > > > > > cost
> > > > > > > > > of global rollback or disruptive scaling becomes much
> higher
> > > than
> > > > > in
> > > > > > > > > traditional row-at-a-time BI / ETL workloads. As a result,
> > > runtime
> > > > > > > > behavior
> > > > > > > > > that may have been acceptable for traditional workloads can
> > > > > directly
> > > > > > > > affect
> > > > > > > > > stability and resource efficiency in inference-oriented
> > > workloads.
> > > > > > > > >
> > > > > > > > > At the same time, the umbrella proposal helps provide a
> shared
> > > > > > context
> > > > > > > > for
> > > > > > > > > discussing how these capabilities relate to each other and
> what
> > > > > > common
> > > > > > > > > runtime assumptions they rely on. The more important value
> of
> > > the
> > > > > > > > umbrella
> > > > > > > > > is to align on the workload model, design principles,
> > > boundaries,
> > > > > and
> > > > > > > > > dependencies between capabilities, so that independently
> > > evolving
> > > > > > > pieces
> > > > > > > > > such as RpcOperator, GPU resource declaration, batching /
> > > > > concurrency
> > > > > > > > > control, non-disruptive scaling, and regional
> checkpointing do
> > > not
> > > > > > end
> > > > > > > up
> > > > > > > > > with inconsistent runtime semantics.
> > > > > > > > >
> > > > > > > > > Based on this discussion, I will update the proposal to
> make
> > > the
> > > > > > > workload
> > > > > > > > > model, RpcOperator boundary, and Layer 3 dependency
> > > relationship
> > > > > > > clearer.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Guowei
> > > > > > > > > Best,
> > > > > > > > > Guowei
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, May 8, 2026 at 12:37 PM zhangjiaogg <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Guowei and all,
> > > > > > > > > >
> > > > > > > > > > Thank you for driving this initiative. Strong +1 on the
> > > overall
> > > > > > > > > direction.
> > > > > > > > > >
> > > > > > > > > > From our perspective, the core value of this proposal
> lies
> > > in two
> > > > > > > > areas:
> > > > > > > > > > extending Flink's intelligent processing capabilities for
> > > > > > multimodal
> > > > > > > > > data,
> > > > > > > > > > and enabling native, in-pipeline local inference. As AI
> > > > > > capabilities
> > > > > > > > > > continue to advance, multimodal data is accounting for an
> > > > > > > increasingly
> > > > > > > > > > large share of overall data volume, and the ability to
> > > perform
> > > > > > > > > intelligent,
> > > > > > > > > > real-time processing on this data — not just ingestion or
> > > > > routing,
> > > > > > > but
> > > > > > > > > > actual inference and transformation within the stream —
> is
> > > > > > becoming a
> > > > > > > > > > critical requirement across industries. Today, most
> pipelines
> > > > > treat
> > > > > > > > > > multimodal objects as opaque blobs and push inference to
> > > external
> > > > > > > > > systems,
> > > > > > > > > > which works but at the cost of complexity, latency, and
> > > > > > consistency.
> > > > > > > > > >
> > > > > > > > > > This is exactly the pain one of our customers
> experiences in
> > > > > their
> > > > > > > > > > autonomous driving data pipelines. Video and image data
> > > captured
> > > > > by
> > > > > > > > > onboard
> > > > > > > > > > cameras must go through annotation, frame extraction,
> quality
> > > > > > > > filtering,
> > > > > > > > > > and both unstructured and structured data transformation
> > > before
> > > > > it
> > > > > > > can
> > > > > > > > be
> > > > > > > > > > used for model training — a workflow that today requires
> > > > > combining
> > > > > > > > > multiple
> > > > > > > > > > specialized systems: a stream engine for structured
> > > processing, a
> > > > > > > > > separate
> > > > > > > > > > framework (e.g., Ray Data) for multimodal processing,
> and an
> > > > > > external
> > > > > > > > > model
> > > > > > > > > > serving layer for inference. Each system boundary
> introduces
> > > > > > > > intermediate
> > > > > > > > > > storage, operational overhead, and data consistency
> > > challenges
> > > > > > across
> > > > > > > > the
> > > > > > > > > > full pipeline. If Flink can handle this end-to-end — with
> > > native
> > > > > > > > > multimodal
> > > > > > > > > > types, local GPU inference operators, and unified
> > > checkpointing —
> > > > > > the
> > > > > > > > > > entire workflow becomes a single Flink job, intermediate
> > > storage
> > > > > is
> > > > > > > > > > eliminated, and fault recovery covers the pipeline as a
> > > whole.
> > > > > > > > > >
> > > > > > > > > > We look forward to the sub-FLIP discussions and would be
> > > happy to
> > > > > > > > > > contribute.
> > > > > > > > > >
> > > > > > > > > > Best regards
> > > > > > > > > > Jiao Zhang
> > > > > > > > > >
> > > > > > > > > > At 2026-05-07 12:58:15, "zihao chen" <
> [email protected]>
> > > > > > wrote:
> > > > > > > > > > >Hi, all,
> > > > > > > > > > >
> > > > > > > > > > >I’d like to share some thoughts based on our internal
> > > experience
> > > > > > > > > > >with AI workloads on Flink.
> > > > > > > > > > >
> > > > > > > > > > >At Tencent, we have production scenarios where Flink is
> > > used in
> > > > > > > > > > >AI-related pipelines.
> > > > > > > > > > >
> > > > > > > > > > >Based on these workloads, we explored elasticity and
> > > autoscaling
> > > > > > > > > > >for cloud-native stream processing systems and
> published our
> > > > > > > > > > >experience in SIGMOD 2025:
> > > > > > > > > > >
> > > > > > > > > > >"Oceanus: Enable SLO-Aware Vertical Autoscaling for
> > > > > > > > > > >Cloud-Native Streaming Services in Tencent" [1]
> > > > > > > > > > >
> > > > > > > > > > >As our workloads evolved, we also started to see
> increasing
> > > > > > > > > > >GPU-based training and inference scenarios.
> > > > > > > > > > >
> > > > > > > > > > >Our current solution integrates Flink with external GPU
> > > > > services.
> > > > > > > > > > >While this works functionally, it also introduces
> several
> > > > > > practical
> > > > > > > > > > >issues, such as:
> > > > > > > > > > >
> > > > > > > > > > >   -
> > > > > > > > > > >
> > > > > > > > > > >   fragmented lifecycle management
> > > > > > > > > > >   -
> > > > > > > > > > >
> > > > > > > > > > >   operational complexity
> > > > > > > > > > >   -
> > > > > > > > > > >
> > > > > > > > > > >   inconsistent scaling/recovery behavior across systems
> > > > > > > > > > >
> > > > > > > > > > >From this perspective, I think FLIP-577 is exploring a
> very
> > > > > > > > > > >meaningful direction.
> > > > > > > > > > >
> > > > > > > > > > >In particular, I agree with the idea of integrating
> > > GPU-backed
> > > > > > > > > > >computation more naturally into Flink’s runtime model,
> > > instead
> > > > > of
> > > > > > > > > > >treating it purely as an external service integration
> > > problem.
> > > > > > > > > > >
> > > > > > > > > > >Besides, from the elasticity perspective, our
> experience is
> > > that
> > > > > > > > > > >GPU workloads have very different characteristics
> compared
> > > with
> > > > > > > > > > >traditional CPU workloads:
> > > > > > > > > > >
> > > > > > > > > > >   -
> > > > > > > > > > >
> > > > > > > > > > >   GPU resources are expensive and scarce
> > > > > > > > > > >   -
> > > > > > > > > > >
> > > > > > > > > > >   Startup and replay costs are significantly higher
> > > > > > > > > > >   -
> > > > > > > > > > >
> > > > > > > > > > >   Long-running tasks make scaling and recovery more
> > > challenging
> > > > > > > > > > >
> > > > > > > > > > >In our experience, GPU elasticity cannot simply reuse
> the
> > > > > > > > > > >assumptions behind traditional CPU elasticity.
> > > > > > > > > > >
> > > > > > > > > > >Because of this, elasticity becomes especially
> important for
> > > > > > > > > > >production AI workloads, not only for resource
> efficiency,
> > > but
> > > > > > also
> > > > > > > > > > >for reducing scaling and recovery overhead.
> > > > > > > > > > >
> > > > > > > > > > >More broadly, AI workloads increasingly require Flink to
> > > > > > collaborate
> > > > > > > > > > >more naturally with GPU-backed computation, and I
> believe
> > > > > > > > > > >FLIP-577 is exploring an important direction toward
> > > addressing
> > > > > > > > > > >this gap.
> > > > > > > > > > >
> > > > > > > > > > >Overall, I’m looking forward to further discussions
> about
> > > this
> > > > > > FLIP.
> > > > > > > > > > >
> > > > > > > > > > >[1] https://dl.acm.org/doi/abs/10.1145/3722212.3724445
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >Best,
> > > > > > > > > > >Zihao Chen
> > > > > > > > > > >
> > > > > > > > > > >Yong Fang <[email protected]> 于2026年5月7日周四 11:11写道:
> > > > > > > > > > >
> > > > > > > > > > >> Hi devs,
> > > > > > > > > > >>
> > > > > > > > > > >> Thanks Guowei for initiating this proposal. I think
> this
> > > is an
> > > > > > > > > important
> > > > > > > > > > >> step for Flink towards the era of AI data processing,
> > > very big
> > > > > > +1.
> > > > > > > > > > >>
> > > > > > > > > > >> I’d like to share some scenarios and requirements of
> > > > > leveraging
> > > > > > > > > PyFlink
> > > > > > > > > > for
> > > > > > > > > > >> AI data processing at ByteDance. Currently, we run
> tens of
> > > > > > > thousands
> > > > > > > > > of
> > > > > > > > > > >> pyflink/flink jobs, using millions of CPU cores.
> > > > > > > > > > >>
> > > > > > > > > > >> 1) Multimodal Data Processing
> > > > > > > > > > >> We want to use PyFlink to generate multimodal feature
> > > data.
> > > > > The
> > > > > > > > > typical
> > > > > > > > > > >> workflow starts by reading ID-based raw data and
> > > performing
> > > > > > large
> > > > > > > > > table
> > > > > > > > > > >> joins and ETL computations. We then fetch multimodal
> > > assets
> > > > > such
> > > > > > > as
> > > > > > > > > > images,
> > > > > > > > > > >> videos, texts and audios from object storage by ID.
> These
> > > > > > > multimodal
> > > > > > > > > > data
> > > > > > > > > > >> are either sent to an RPC service (backed by local
> models
> > > or
> > > > > > > remote
> > > > > > > > > > large
> > > > > > > > > > >> models), or processed via local GPU computing for
> frame
> > > > > > > extraction,
> > > > > > > > > > >> embedding generation and other tasks. After multimodal
> > > > > > > computation,
> > > > > > > > > > output
> > > > > > > > > > >> results including embeddings, processed images and
> > > multimodal
> > > > > > > > metadata
> > > > > > > > > > are
> > > > > > > > > > >> generated and persisted into the downstream multimodal
> > > data
> > > > > > lake.
> > > > > > > > > > >>
> > > > > > > > > > >> 2) Stream-Batch Unified Data Training
> > > > > > > > > > >> We use PyFlink to consume processed sample data from
> MQ or
> > > > > data
> > > > > > > > lakes.
> > > > > > > > > > >> Within the data pipeline, data may be shuffled by key,
> > > then
> > > > > fed
> > > > > > > into
> > > > > > > > > > >> parameter servers or local services for CPU-based or
> > > GPU-based
> > > > > > > model
> > > > > > > > > > >> training. Such workloads strongly demand optimized
> CPU &
> > > GPU
> > > > > > > hybrid
> > > > > > > > > > >> scheduling, worker node restart capability, fast
> scaling,
> > > as
> > > > > > well
> > > > > > > as
> > > > > > > > > > native
> > > > > > > > > > >> support for unified streaming and batch training.
> > > > > > > > > > >>
> > > > > > > > > > >> While supporting the above two categories of
> workloads, we
> > > > > have
> > > > > > > > > several
> > > > > > > > > > >> common requirements for PyFlink and Flink Core:
> > > > > > > > > > >>
> > > > > > > > > > >> 1) Native Python Computing Capability
> > > > > > > > > > >> We need more user-friendly DataFrame APIs,
> comprehensive
> > > > > > built-in
> > > > > > > > data
> > > > > > > > > > >> types for image and audio, as well as richer
> multimodal
> > > > > > computing
> > > > > > > > > > >> operators. This enables users to develop multimodal
> data
> > > > > > > processing
> > > > > > > > > jobs
> > > > > > > > > > >> more efficiently, and allows better optimization and
> > > > > scheduling
> > > > > > at
> > > > > > > > the
> > > > > > > > > > >> execution plan layer.
> > > > > > > > > > >>
> > > > > > > > > > >> 2) CPU & GPU Scheduling and Resource Management
> > > > > > > > > > >> It is expected to tag resource requirements at
> > > fine-grained
> > > > > > levels
> > > > > > > > > such
> > > > > > > > > > as
> > > > > > > > > > >> python user-defined functions. The scheduler should
> > > provide
> > > > > > > enhanced
> > > > > > > > > > >> resource orchestration and task scheduling, enabling
> more
> > > > > > flexible
> > > > > > > > > > >> heterogeneous resource management.
> > > > > > > > > > >>
> > > > > > > > > > >> 3) Embedded Server Node Capability in Pipeline
> > > > > > > > > > >> We hope to launch dedicated server nodes inside a
> single
> > > > > > pipeline,
> > > > > > > > to
> > > > > > > > > > load
> > > > > > > > > > >> local models or access remote models, which may be
> shared
> > > > > > between
> > > > > > > > > > different
> > > > > > > > > > >> operators in the pipeline. This unifies data ETL and
> > > > > multimodal
> > > > > > > > > > computing
> > > > > > > > > > >> within one end-to-end pipeline, greatly simplifying
> > > operation
> > > > > > and
> > > > > > > > > > >> maintenance for business teams.
> > > > > > > > > > >>
> > > > > > > > > > >> 4) Performance and Stability Optimization
> > > > > > > > > > >> Key enhancements include zero-copy data transfer
> between
> > > Flink
> > > > > > TM
> > > > > > > > > > processes
> > > > > > > > > > >> and Python processes, fast scaling of compute nodes,
> fast
> > > > > > > > > checkpointing
> > > > > > > > > > >> mechanism, and shuffle optimization for large-scale
> > > datasets.
> > > > > > > These
> > > > > > > > > > >> improvements will significantly boost the performance
> and
> > > > > > > stability
> > > > > > > > of
> > > > > > > > > > >> PyFlink multimodal workloads.
> > > > > > > > > > >>
> > > > > > > > > > >> I’m really excited to see that FLIP-577 has covered
> all
> > > the
> > > > > real
> > > > > > > > > > production
> > > > > > > > > > >> scenarios and requirements mentioned above. It's a
> good
> > > chance
> > > > > > to
> > > > > > > > > > iterate
> > > > > > > > > > >> and enrich the core capabilities of PyFlink and Flink
> Core
> > > > > > > targeting
> > > > > > > > > > these
> > > > > > > > > > >> AI data processing scenarios, and build Flink into a
> > > > > first-class
> > > > > > > AI
> > > > > > > > > data
> > > > > > > > > > >> processing engine.
> > > > > > > > > > >>
> > > > > > > > > > >> I’m very much looking forward to the progress.
> > > > > > > > > > >>
> > > > > > > > > > >> Best,
> > > > > > > > > > >> FangYong
> > > > > > > > > > >>
> > > > > > > > > > >> On Wed, May 6, 2026 at 8:36 PM FeatZhang <
> > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> > Hi all,
> > > > > > > > > > >> >
> > > > > > > > > > >> > This is a great topic, and honestly long overdue.
> > > > > > > > > > >> >
> > > > > > > > > > >> > With the rapid growth of AI applications, we have
> been
> > > > > seeing
> > > > > > a
> > > > > > > > > > >> > significant increase in real-world demands from
> users
> > > who
> > > > > are
> > > > > > > > > already
> > > > > > > > > > >> > building on Flink and other traditional data
> processing
> > > or
> > > > > BI
> > > > > > > > > engines.
> > > > > > > > > > >> > From a platform perspective, more and more teams are
> > > trying
> > > > > to
> > > > > > > > > > >> > integrate AI capabilities directly into their
> existing
> > > > > > streaming
> > > > > > > > > > >> > pipelines, rather than treating them as separate
> > > systems.
> > > > > > > > > > >> >
> > > > > > > > > > >> > This is not an isolated trend — it is becoming a
> common
> > > > > > > > requirement
> > > > > > > > > > >> > across industries.
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 1. What is changing in real systems
> > > > > > > > > > >> >
> > > > > > > > > > >> > We are observing a consistent shift:
> > > > > > > > > > >> >
> > > > > > > > > > >> > AI is moving from offline analysis or request-time
> > > scoring
> > > > > to
> > > > > > > > > > >> > continuous, event-driven decision making.
> > > > > > > > > > >> >
> > > > > > > > > > >> > In other words:
> > > > > > > > > > >> >
> > > > > > > > > > >> > AI is becoming part of the data stream itself.
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 2. Representative production scenarios
> > > > > > > > > > >> >
> > > > > > > > > > >> > 2.1 Real-time fraud detection (per-event decision
> under
> > > > > strict
> > > > > > > > > > latency)
> > > > > > > > > > >> >
> > > > > > > > > > >> > Typical setup
> > > > > > > > > > >> >
> > > > > > > > > > >> > Continuous transaction stream (payments, logins,
> > > transfers)
> > > > > > > > > > >> > Each event must be evaluated within milliseconds
> > > > > > > > > > >> > Decision depends on:
> > > > > > > > > > >> >
> > > > > > > > > > >> > recent user behavior
> > > > > > > > > > >> > device / IP patterns
> > > > > > > > > > >> > short-term aggregates
> > > > > > > > > > >> >
> > > > > > > > > > >> > What is happening in practice
> > > > > > > > > > >> >
> > > > > > > > > > >> > Models are already deeply integrated into decision
> flow
> > > > > > > > > > >> > Feature freshness directly impacts detection
> accuracy
> > > > > > > > > > >> >
> > > > > > > > > > >> > Current pain
> > > > > > > > > > >> >
> > > > > > > > > > >> > Features computed in streaming, but inference is
> remote
> > > > > > > > > > >> > Network overhead adds to critical path latency
> > > > > > > > > > >> > Hard to ensure training/serving consistency
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 2.2 Real-time recommendation and ads (continuous
> > > re-ranking)
> > > > > > > > > > >> >
> > > > > > > > > > >> > Typical setup
> > > > > > > > > > >> >
> > > > > > > > > > >> > User interaction stream (click, view, dwell time)
> > > > > > > > > > >> > Continuous feature updates (session + short-term
> > > behavior)
> > > > > > > > > > >> > Inference triggered per interaction
> > > > > > > > > > >> >
> > > > > > > > > > >> > What is happening in practice
> > > > > > > > > > >> >
> > > > > > > > > > >> > Increasing reliance on real-time context
> > > > > > > > > > >> > Model-based ranking becomes core logic
> > > > > > > > > > >> >
> > > > > > > > > > >> > Current pain
> > > > > > > > > > >> >
> > > > > > > > > > >> > Offline and online feature pipelines diverge
> > > > > > > > > > >> > Training-serving skew is common
> > > > > > > > > > >> > Inference orchestration is ad-hoc
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 2.3 Streaming RAG / knowledge systems (continuous
> > > indexing)
> > > > > > > > > > >> >
> > > > > > > > > > >> > Typical setup
> > > > > > > > > > >> >
> > > > > > > > > > >> > Continuous ingestion of documents, logs, or
> knowledge
> > > > > > > > > > >> > Pipeline:
> > > > > > > > > > >> >
> > > > > > > > > > >> > chunking → embedding → indexing → retrieval
> > > > > > > > > > >> >
> > > > > > > > > > >> > Typical use cases
> > > > > > > > > > >> >
> > > > > > > > > > >> > AI copilots
> > > > > > > > > > >> > enterprise knowledge assistants
> > > > > > > > > > >> > observability systems
> > > > > > > > > > >> >
> > > > > > > > > > >> > Current pain
> > > > > > > > > > >> >
> > > > > > > > > > >> > Built via loosely coupled services or scripts
> > > > > > > > > > >> > No strong consistency guarantees
> > > > > > > > > > >> > Difficult to scale and recover
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 2.4 Real-time feedback loop (continuous evaluation)
> > > > > > > > > > >> >
> > > > > > > > > > >> > Typical setup
> > > > > > > > > > >> >
> > > > > > > > > > >> > Prediction at time T
> > > > > > > > > > >> > Label arrives at T + Δ
> > > > > > > > > > >> >
> > > > > > > > > > >> > Required processing
> > > > > > > > > > >> >
> > > > > > > > > > >> > prediction stream JOIN label stream → metrics →
> > > optimization
> > > > > > > > > > >> >
> > > > > > > > > > >> > Current pain
> > > > > > > > > > >> >
> > > > > > > > > > >> > Alignment logic duplicated across systems
> > > > > > > > > > >> > Late data handling is complex
> > > > > > > > > > >> > No reusable evaluation abstraction
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 2.5 AI replacing rule-based decision logic
> > > > > > > > > > >> >
> > > > > > > > > > >> > Evolution
> > > > > > > > > > >> >
> > > > > > > > > > >> > From:
> > > > > > > > > > >> >
> > > > > > > > > > >> > rule engine / CEP
> > > > > > > > > > >> >
> > > > > > > > > > >> > To:
> > > > > > > > > > >> >
> > > > > > > > > > >> > model / LLM → decision
> > > > > > > > > > >> >
> > > > > > > > > > >> > Implication
> > > > > > > > > > >> >
> > > > > > > > > > >> > AI is becoming the core decision layer inside
> streaming
> > > > > > systems.
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 3. Architectural shift
> > > > > > > > > > >> >
> > > > > > > > > > >> > Across all scenarios:
> > > > > > > > > > >> >
> > > > > > > > > > >> > From:
> > > > > > > > > > >> >
> > > > > > > > > > >> > data processing → feature system → model serving →
> > > > > evaluation
> > > > > > > > > > >> >
> > > > > > > > > > >> > To:
> > > > > > > > > > >> >
> > > > > > > > > > >> > stream = feature + inference + decision + feedback
> > > > > > > > > > >> >
> > > > > > > > > > >> > This reflects a fundamental change in system
> boundaries.
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 4. Why externalized architectures break down
> > > > > > > > > > >> >
> > > > > > > > > > >> > Most current implementations rely on multiple
> systems:
> > > > > > > > > > >> >
> > > > > > > > > > >> > stream processing
> > > > > > > > > > >> > feature store
> > > > > > > > > > >> > model serving
> > > > > > > > > > >> > vector database
> > > > > > > > > > >> >
> > > > > > > > > > >> > This introduces several fundamental issues in
> real-time
> > > > > > > scenarios.
> > > > > > > > > > >> >
> > > > > > > > > > >> > 4.1 Latency dominated by system boundaries
> > > > > > > > > > >> >
> > > > > > > > > > >> > stream → network → model service → response
> > > > > > > > > > >> >
> > > > > > > > > > >> > network overhead is unavoidable
> > > > > > > > > > >> > batching is not controlled by the stream runtime
> > > > > > > > > > >> > no end-to-end backpressure
> > > > > > > > > > >> >
> > > > > > > > > > >> > Latency becomes a system-level artifact rather than
> a
> > > > > compute
> > > > > > > > > > property.
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 4.2 Inconsistent data between training and serving
> > > > > > > > > > >> >
> > > > > > > > > > >> > offline vs online features
> > > > > > > > > > >> > different definitions or time windows
> > > > > > > > > > >> >
> > > > > > > > > > >> > Models operate on inconsistent data distributions.
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 4.3 State fragmentation
> > > > > > > > > > >> >
> > > > > > > > > > >> > user/session context must be rebuilt or fetched
> > > > > > > > > > >> > loss of data locality
> > > > > > > > > > >> > processing becomes call-driven
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 4.4 Feedback loop is not composable
> > > > > > > > > > >> >
> > > > > > > > > > >> > difficult alignment of prediction and label streams
> > > > > > > > > > >> > no unified handling of late data
> > > > > > > > > > >> > duplicated evaluation logic
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 4.5 Operational complexity
> > > > > > > > > > >> >
> > > > > > > > > > >> > multiple systems to scale
> > > > > > > > > > >> > multiple failure domains
> > > > > > > > > > >> > complex debugging paths
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 5. Why this aligns with Flink
> > > > > > > > > > >> >
> > > > > > > > > > >> > These workloads require:
> > > > > > > > > > >> >
> > > > > > > > > > >> > event-driven execution
> > > > > > > > > > >> > strong state management
> > > > > > > > > > >> > precise time semantics
> > > > > > > > > > >> > continuous feedback
> > > > > > > > > > >> >
> > > > > > > > > > >> > These are exactly Flink’s core strengths.
> > > > > > > > > > >> >
> > > > > > > > > > >> > The key insight is:
> > > > > > > > > > >> >
> > > > > > > > > > >> > Inference should be modeled as a dataflow operator,
> not
> > > an
> > > > > > > > external
> > > > > > > > > > >> > service.
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 6. Implication
> > > > > > > > > > >> >
> > > > > > > > > > >> > If we model:
> > > > > > > > > > >> >
> > > > > > > > > > >> > data → feature → inference → decision → feedback
> > > > > > > > > > >> >
> > > > > > > > > > >> > within Flink, we can achieve:
> > > > > > > > > > >> >
> > > > > > > > > > >> > unified scheduling
> > > > > > > > > > >> > shared state
> > > > > > > > > > >> > consistent time semantics
> > > > > > > > > > >> > end-to-end fault tolerance
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > 7. Conclusion
> > > > > > > > > > >> >
> > > > > > > > > > >> > This is not simply about adding AI support to Flink.
> > > > > > > > > > >> >
> > > > > > > > > > >> > It is about recognizing that:
> > > > > > > > > > >> >
> > > > > > > > > > >> > Real-time AI systems are fundamentally streaming
> > > systems.
> > > > > > > > > > >> >
> > > > > > > > > > >> > The question is whether Flink evolves to support
> this
> > > > > > natively,
> > > > > > > or
> > > > > > > > > > >> > remains a preprocessing layer in front of external
> AI
> > > > > stacks.
> > > > > > > > > > >> >
> > > > > > > > > > >> > ________________________________
> > > > > > > > > > >> >
> > > > > > > > > > >> > Happy to follow up with a more concrete proposal
> (e.g.,
> > > > > > > inference
> > > > > > > > > > >> > operator abstraction) if there is interest.
> > > > > > > > > > >> >
> > > > > > > > > > >> > Thanks.
> > > > > > > > > > >> >
> > > > > > > > > > >> > On Mon, May 4, 2026 at 10:31 PM Gen Luo <
> > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Hi all,
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Thank you Guowei Ma for driving this discussion,
> and
> > > > > thanks
> > > > > > > > > everyone
> > > > > > > > > > >> for
> > > > > > > > > > >> > > the valuable insights. Inspired by this exchange,
> I’d
> > > like
> > > > > > to
> > > > > > > > > share
> > > > > > > > > > a
> > > > > > > > > > >> few
> > > > > > > > > > >> > > thoughts.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > While “AI-Native” covers broad ground, I believe
> this
> > > FLIP
> > > > > > > does
> > > > > > > > > not
> > > > > > > > > > >> > > overextend Flink’s scope. It’s a necessary
> iteration
> > > > > driven
> > > > > > by
> > > > > > > > > > evolving
> > > > > > > > > > >> > > user scenarios and AI advancements, particularly
> > > > > multimodal
> > > > > > > > > > processing.
> > > > > > > > > > >> > > Given the growing adoption of multimodal
> applications
> > > and
> > > > > > > > > increasing
> > > > > > > > > > >> > > interest in low-latency inference, initiating
> these
> > > > > > > enhancements
> > > > > > > > > is
> > > > > > > > > > a
> > > > > > > > > > >> > > timely step to better align Flink with evolving AI
> > > > > > workloads.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > From our engagements with customers and
> developers, we
> > > > > > > observe a
> > > > > > > > > > clear
> > > > > > > > > > >> > > shift in both workloads and user expectations.
> Model
> > > > > > inference
> > > > > > > > is
> > > > > > > > > > >> > > increasingly central to data pipelines, with
> > > multimodal AI
> > > > > > > tasks
> > > > > > > > > > >> growing
> > > > > > > > > > >> > > rapidly. Traditional real-time scenarios (e.g.,
> > > monitoring
> > > > > > and
> > > > > > > > > > >> analytics)
> > > > > > > > > > >> > > now leverage models and agent frameworks like
> Flink
> > > Agent
> > > > > > for
> > > > > > > > > > >> > intelligent,
> > > > > > > > > > >> > > multi-turn decision-making, while large-scale
> offline
> > > > > > compute
> > > > > > > is
> > > > > > > > > > also
> > > > > > > > > > >> > > shifting toward LLMs and vision models. Alongside
> this
> > > > > > > workload
> > > > > > > > > > >> > evolution,
> > > > > > > > > > >> > > developer workflows have adapted: AI practitioners
> > > > > naturally
> > > > > > > > > prefer
> > > > > > > > > > >> > Python
> > > > > > > > > > >> > > and DataFrame-style APIs. As AI-assisted coding
> > > matures,
> > > > > > > > aligning
> > > > > > > > > > >> system
> > > > > > > > > > >> > > interfaces with these familiar patterns will
> directly
> > > > > > improve
> > > > > > > > > > >> > AI-generated
> > > > > > > > > > >> > > code quality and significantly lower adoption
> > > barriers for
> > > > > > the
> > > > > > > > AI
> > > > > > > > > > >> > community.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Today, many AI evaluation tools don’t yet
> recommend
> > > Flink
> > > > > > for
> > > > > > > AI
> > > > > > > > > > >> > > workloads—largely due to limited visibility of
> Flink’s
> > > > > > > relevant
> > > > > > > > > > >> > > capabilities rather than fundamental
> incompatibility.
> > > In
> > > > > > > > reality,
> > > > > > > > > > Flink
> > > > > > > > > > >> > has
> > > > > > > > > > >> > > unique strengths here. For example, generating
> > > multimodal
> > > > > > > > samples
> > > > > > > > > is
> > > > > > > > > > >> > often
> > > > > > > > > > >> > > a multi-day, GPU-heavy process. Flink’s streaming
> > > model,
> > > > > > > > combined
> > > > > > > > > > with
> > > > > > > > > > >> > > checkpointing and reduced disk I/O, is
> well-suited for
> > > > > such
> > > > > > > > > > >> long-running
> > > > > > > > > > >> > > tasks—a direction also pursued by engines like
> Daft
> > > and
> > > > > Ray
> > > > > > > > Data.
> > > > > > > > > > With
> > > > > > > > > > >> > > Flink’s proven production stability, we’re
> > > well-positioned
> > > > > > for
> > > > > > > > > both
> > > > > > > > > > >> batch
> > > > > > > > > > >> > > and future real-time multimodal streaming
> inference.
> > > > > > Targeted
> > > > > > > > > > >> > improvements
> > > > > > > > > > >> > > can make these advantages visible, driving better
> user
> > > > > > > > experiences
> > > > > > > > > > and
> > > > > > > > > > >> > > healthier ecosystem growth.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > I’d also note a lesson from FlinkML. It attempted
> to
> > > cover
> > > > > > > model
> > > > > > > > > > >> training
> > > > > > > > > > >> > > but struggled to align with the fast-iteration,
> > > > > > > > > > Python/notebook-centric
> > > > > > > > > > >> > > workflows preferred by ML researchers. Flink’s
> core
> > > > > strength
> > > > > > > > lies
> > > > > > > > > in
> > > > > > > > > > >> > > high-concurrency, production-grade inference
> > > > > > orchestration—not
> > > > > > > > > > training
> > > > > > > > > > >> > > lifecycle management (e.g., experiment tracking,
> > > > > > versioning).
> > > > > > > > This
> > > > > > > > > > >> > mismatch
> > > > > > > > > > >> > > limited its adoption.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > This proposal, however, takes a different path. It
> > > doesn’t
> > > > > > aim
> > > > > > > > to
> > > > > > > > > > >> replace
> > > > > > > > > > >> > > training frameworks. Instead, it introduces
> modern AI
> > > > > > concepts
> > > > > > > > > > >> > (multimodal
> > > > > > > > > > >> > > data, LLMs) as first-class citizens for inference,
> > > built
> > > > > > atop
> > > > > > > > > > Flink’s
> > > > > > > > > > >> > > computation strengths. Think Ray Data’s scope
> (plus
> > > simple
> > > > > > > > > > co-located
> > > > > > > > > > >> > > serving), not Train/Tune. Crucially, unlike the
> > > FlinkML
> > > > > era,
> > > > > > > > > today’s
> > > > > > > > > > >> > models
> > > > > > > > > > >> > > use standardized interfaces and mature serving
> > > frameworks,
> > > > > > > > > allowing
> > > > > > > > > > >> Flink
> > > > > > > > > > >> > > to integrate external models seamlessly without
> heavy
> > > > > > > > > > >> > > customization—significantly lowering project risk.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > This FLIP marks Flink’s another starting point for
> > > the AI
> > > > > > era.
> > > > > > > > > While
> > > > > > > > > > >> > > details need refinement, I believe this direction
> > > aligns
> > > > > > with
> > > > > > > > both
> > > > > > > > > > >> > current
> > > > > > > > > > >> > > and future user needs and Flink’s evolution.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > Best,
> > > > > > > > > > >> > > Gen
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > On Mon, May 4, 2026 at 12:15 PM Jark Wu <
> > > [email protected]
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > > Hi Guowei,
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Thanks for driving this. +1 on the overall
> > > direction.
> > > > > > > Flink's
> > > > > > > > > > >> > > > streaming processing and checkpoint mechanism
> give
> > > it a
> > > > > > > > > structural
> > > > > > > > > > >> > > > advantage over systems like Daft and Ray. But
> today,
> > > > > these
> > > > > > > > > runtime
> > > > > > > > > > >> > > > strengths are held back by gaps in Python API,
> GPU
> > > > > > > scheduling,
> > > > > > > > > and
> > > > > > > > > > >> > > > native multimodal data handling. This umbrella
> FLIP
> > > > > > > addresses
> > > > > > > > > > exactly
> > > > > > > > > > >> > > > that gap, comprehensively and systematically. I
> > > believe
> > > > > > > > > multimodal
> > > > > > > > > > >> > > > data processing is the biggest opportunity for
> > > > > traditional
> > > > > > > > data
> > > > > > > > > > infra
> > > > > > > > > > >> > > > to transition into AI infra, and this is one of
> the
> > > most
> > > > > > > > > important
> > > > > > > > > > >> > > > FLIPs for Flink in the AI era.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > As one of the Table/SQL module maintainers, we
> would
> > > > > like
> > > > > > to
> > > > > > > > > > >> > > > contribute the built-in multimodal processing
> UDFs
> > > > > (audio,
> > > > > > > > > video,
> > > > > > > > > > >> > > > image, text) and native multimodal data types
> > > (Tensor,
> > > > > > > Image,
> > > > > > > > > > >> > > > Embedding, etc.) as first-class citizens in the
> type
> > > > > > system.
> > > > > > > > > > Looking
> > > > > > > > > > >> > > > forward to the sub-FLIP discussions.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Best,
> > > > > > > > > > >> > > > Jark
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > On Thu, 30 Apr 2026 at 18:42, Guowei Ma <
> > > > > > > [email protected]
> > > > > > > > >
> > > > > > > > > > >> wrote:
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > Hi,Yaroslav
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > Thanks for taking the time to write this
> detailed
> > > > > > > feedback.
> > > > > > > > > Let
> > > > > > > > > > me
> > > > > > > > > > >> > > > clarify
> > > > > > > > > > >> > > > > the intent of the proposal first.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > I am not saying that Flink should become an AI
> > > > > > framework,
> > > > > > > an
> > > > > > > > > ML
> > > > > > > > > > >> > platform,
> > > > > > > > > > >> > > > > or a model serving system. The way I use
> > > "AI-Native"
> > > > > in
> > > > > > > this
> > > > > > > > > > >> > proposal is
> > > > > > > > > > >> > > > to
> > > > > > > > > > >> > > > > say that Flink should support, as first-class
> > > > > citizens,
> > > > > > > the
> > > > > > > > > core
> > > > > > > > > > >> > objects
> > > > > > > > > > >> > > > > and execution patterns that frequently show
> up in
> > > > > > > > AI-oriented
> > > > > > > > > > data
> > > > > > > > > > >> > > > > processing — instead of leaving them entirely
> to
> > > > > > external
> > > > > > > > > > systems
> > > > > > > > > > >> or
> > > > > > > > > > >> > ad
> > > > > > > > > > >> > > > hoc
> > > > > > > > > > >> > > > > user-defined integrations.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > These objects and execution patterns include:
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > >    - Multimodal and unstructured data objects
> > > such as
> > > > > > > > images,
> > > > > > > > > > >> video,
> > > > > > > > > > >> > > > audio,
> > > > > > > > > > >> > > > >    tensors, embeddings, and object references.
> > > > > > > > > > >> > > > >    - Model inference as part of the data flow,
> > > rather
> > > > > > than
> > > > > > > > an
> > > > > > > > > > >> > entirely
> > > > > > > > > > >> > > > >    external black-box service call.
> > > > > > > > > > >> > > > >    - Operators backed by heterogeneous
> resources
> > > such
> > > > > as
> > > > > > > > GPUs.
> > > > > > > > > > >> > > > >    - Pythonic and vectorized processing
> styles.
> > > > > > > > > > >> > > > >    - Long-running, long-tailed asynchronous
> > > > > computation.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > "AI-Native" is just a shorthand here, meaning
> that
> > > > > Flink
> > > > > > > > > should
> > > > > > > > > > >> > natively
> > > > > > > > > > >> > > > > understand and support the core abstractions
> of
> > > this
> > > > > > class
> > > > > > > > of
> > > > > > > > > > >> > workloads.
> > > > > > > > > > >> > > > > The FLIP needs to make the target workload
> class
> > > > > > clearer.
> > > > > > > > What
> > > > > > > > > > we
> > > > > > > > > > >> > care
> > > > > > > > > > >> > > > > about is not any specific model paradigm —
> LLM,
> > > CV,
> > > > > > > > > > recommendation,
> > > > > > > > > > >> > or
> > > > > > > > > > >> > > > > traditional ML inference — but a class of data
> > > > > > processing
> > > > > > > > > > workloads
> > > > > > > > > > >> > with
> > > > > > > > > > >> > > > > shared runtime and topology characteristics:
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > >    - A single computation may take seconds or
> even
> > > > > > > minutes,
> > > > > > > > > > instead
> > > > > > > > > > >> > of
> > > > > > > > > > >> > > > >    microseconds as in traditional
> row-at-a-time
> > > > > > > processing.
> > > > > > > > > > >> > > > >    - Execution often involves heterogeneous
> > > resources
> > > > > > such
> > > > > > > > as
> > > > > > > > > > CPU +
> > > > > > > > > > >> > GPU,
> > > > > > > > > > >> > > > >    where GPUs are expensive and scarce.
> > > > > > > > > > >> > > > >    - Data is often multimodal large objects
> > > (images,
> > > > > > > video,
> > > > > > > > > > audio,
> > > > > > > > > > >> > > > tensors,
> > > > > > > > > > >> > > > >    embeddings), rather than structured small
> > > records.
> > > > > > > > > > >> > > > >    - Computation logic often includes model
> > > inference
> > > > > or
> > > > > > > > > > >> > service-style
> > > > > > > > > > >> > > > >    invocations as part of the pipeline.
> > > > > > > > > > >> > > > >    - Many target topologies are relatively
> > > > > shuffle-light
> > > > > > > and
> > > > > > > > > > don't
> > > > > > > > > > >> > > > >    necessarily involve complex keyed-state
> > > migration,
> > > > > > e.g.
> > > > > > > > > URI →
> > > > > > > > > > >> > > > preprocessing
> > > > > > > > > > >> > > > >    → inference → sink.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > Ten years ago, many ML workloads took the
> form of
> > > > > > offline
> > > > > > > > > > training
> > > > > > > > > > >> > plus
> > > > > > > > > > >> > > > > online feature serving. Flink already played a
> > > strong
> > > > > > role
> > > > > > > > in
> > > > > > > > > > >> feature
> > > > > > > > > > >> > > > > engineering, streaming feature computation,
> and
> > > > > > real-time
> > > > > > > > data
> > > > > > > > > > >> > > > preparation,
> > > > > > > > > > >> > > > > so there was no strong need to reshape Flink
> into
> > > an
> > > > > > > > > "ML-Native"
> > > > > > > > > > >> > engine.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > What is changing today is that model inference
> > > itself
> > > > > is
> > > > > > > > > > >> increasingly
> > > > > > > > > > >> > > > > becoming part of the data processing pipeline;
> > > > > > multimodal
> > > > > > > > > > objects
> > > > > > > > > > >> > are no
> > > > > > > > > > >> > > > > longer just opaque blobs in external storage,
> but
> > > data
> > > > > > > > objects
> > > > > > > > > > that
> > > > > > > > > > >> > need
> > > > > > > > > > >> > > > to
> > > > > > > > > > >> > > > > be referenced, passed, transformed, inferred
> > > over, and
> > > > > > > > landed
> > > > > > > > > > >> inside
> > > > > > > > > > >> > the
> > > > > > > > > > >> > > > > engine. This is not simply one more ML use
> case —
> > > it
> > > > > is
> > > > > > a
> > > > > > > > > > change in
> > > > > > > > > > >> > the
> > > > > > > > > > >> > > > > shape of workloads Flink needs to support.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > On whether the user demand is real, the
> validation
> > > > > > signals
> > > > > > > > we
> > > > > > > > > > are
> > > > > > > > > > >> > > > currently
> > > > > > > > > > >> > > > > seeing include:
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > >    - Within Alibaba, multimodal data
> processing is
> > > > > > already
> > > > > > > > in
> > > > > > > > > > >> > production,
> > > > > > > > > > >> > > > >    covering image, video, audio, and text
> > > modalities.
> > > > > > > > > > >> > > > >    - In offline conversations with several
> > > companies
> > > > > > > > > (including
> > > > > > > > > > >> > ByteDance
> > > > > > > > > > >> > > > >    and Tencent), we have heard substantial
> demand
> > > for
> > > > > > > Flink
> > > > > > > > to
> > > > > > > > > > >> > support
> > > > > > > > > > >> > > > AI data
> > > > > > > > > > >> > > > >    processing / multimodal data processing.
> > > > > > > > > > >> > > > >    - On the ecosystem side, we are working
> with
> > > NVIDIA
> > > > > > on
> > > > > > > a
> > > > > > > > > > joint
> > > > > > > > > > >> > demo
> > > > > > > > > > >> > > > >    focused on multimodal data processing,
> planned
> > > for
> > > > > > > Flink
> > > > > > > > > > Forward
> > > > > > > > > > >> > Asia.
> > > > > > > > > > >> > > > >    - The emergence and growth of systems such
> as
> > > Daft,
> > > > > > Ray
> > > > > > > > > Data,
> > > > > > > > > > >> > > > >    Data-Juicer, and LAS also reflect rapidly
> > > growing
> > > > > > > demand
> > > > > > > > > for
> > > > > > > > > > >> > > > multimodal
> > > > > > > > > > >> > > > >    data processing.
> > > > > > > > > > >> > > > >    - There have also been independent
> discussions
> > > in
> > > > > > this
> > > > > > > > > > direction
> > > > > > > > > > >> > > > within
> > > > > > > > > > >> > > > >    the community — for example, the
> > > "Streaming-native
> > > > > AI
> > > > > > > > > > Inference
> > > > > > > > > > >> > > > Runtime
> > > > > > > > > > >> > > > >    Layer" proposal on the dev list.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > On "why now, instead of waiting for
> > > standardization"
> > > > > — I
> > > > > > > > > > understand
> > > > > > > > > > >> > the
> > > > > > > > > > >> > > > > concern. LLM-related frameworks, APIs, and
> > > > > > > application-level
> > > > > > > > > > >> > patterns are
> > > > > > > > > > >> > > > > indeed changing quickly. If this FLIP were
> trying
> > > to
> > > > > > bake
> > > > > > > a
> > > > > > > > > > >> specific
> > > > > > > > > > >> > LLM
> > > > > > > > > > >> > > > > API, agent framework, or prompt protocol into
> > > Flink,
> > > > > the
> > > > > > > > risk
> > > > > > > > > > would
> > > > > > > > > > >> > be
> > > > > > > > > > >> > > > high.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > But most of the capabilities in this proposal
> are
> > > not
> > > > > > > > > > LLM-specific.
> > > > > > > > > > >> > They
> > > > > > > > > > >> > > > > are more fundamental data processing and
> runtime
> > > > > > > > capabilities:
> > > > > > > > > > >> > Pipeline
> > > > > > > > > > >> > > > > Region-level checkpointing, Object Reference,
> GPU
> > > > > > resource
> > > > > > > > > > >> > declaration,
> > > > > > > > > > >> > > > > columnar data transfer, service-style operator
> > > > > > invocation,
> > > > > > > > > > >> > long-running
> > > > > > > > > > >> > > > > async execution. These are useful for today's
> LLM
> > > > > > > workloads,
> > > > > > > > > and
> > > > > > > > > > >> > equally
> > > > > > > > > > >> > > > > useful for future AI workloads in shapes we
> cannot
> > > > > fully
> > > > > > > > > predict
> > > > > > > > > > >> > yet. The
> > > > > > > > > > >> > > > > fast-changing parts should live in the
> ecosystem
> > > and
> > > > > SDK
> > > > > > > > > layer;
> > > > > > > > > > the
> > > > > > > > > > >> > FLIP
> > > > > > > > > > >> > > > > should focus on more stable engine-level
> > > capabilities.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > On tactical changes vs. umbrella, I partly
> agree
> > > with
> > > > > > you.
> > > > > > > > > Each
> > > > > > > > > > >> > sub-FLIP
> > > > > > > > > > >> > > > > should be discussed, reviewed, and accepted or
> > > > > rejected
> > > > > > on
> > > > > > > > its
> > > > > > > > > > own
> > > > > > > > > > >> > > > merits.
> > > > > > > > > > >> > > > > The umbrella should not bypass the normal FLIP
> > > > > process,
> > > > > > > and
> > > > > > > > > > >> > accepting the
> > > > > > > > > > >> > > > > umbrella does not mean accepting all
> sub-FLIPs.
> > > That
> > > > > > > said, I
> > > > > > > > > > still
> > > > > > > > > > >> > think
> > > > > > > > > > >> > > > > the umbrella is valuable. Its purpose is not
> to
> > > bind
> > > > > the
> > > > > > > 11
> > > > > > > > > > changes
> > > > > > > > > > >> > into
> > > > > > > > > > >> > > > a
> > > > > > > > > > >> > > > > single inseparable package, but to help the
> > > community
> > > > > > > align
> > > > > > > > on
> > > > > > > > > > >> > > > principles,
> > > > > > > > > > >> > > > > clarify boundaries and dependencies, and avoid
> > > > > > conflicting
> > > > > > > > or
> > > > > > > > > > >> > duplicated
> > > > > > > > > > >> > > > > abstractions across related capabilities.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > For example, if RpcOperator is not considered
> > > together
> > > > > > > with
> > > > > > > > > > >> > > > non-disruptive
> > > > > > > > > > >> > > > > scaling, it is hard to give GPU operator
> > > elasticity
> > > > > > > coherent
> > > > > > > > > > >> > semantics.
> > > > > > > > > > >> > > > > Deploying inference services independently is
> > > only the
> > > > > > > first
> > > > > > > > > > step;
> > > > > > > > > > >> > the
> > > > > > > > > > >> > > > > harder question is how Flink uniformly handles
> > > service
> > > > > > > > > > discovery,
> > > > > > > > > > >> > > > in-flight
> > > > > > > > > > >> > > > > request draining, backpressure, and failover
> > > during
> > > > > > > scaling.
> > > > > > > > > > >> Without
> > > > > > > > > > >> > an
> > > > > > > > > > >> > > > > umbrella, these capabilities can certainly be
> > > advanced
> > > > > > as
> > > > > > > > > > tactical
> > > > > > > > > > >> > > > changes,
> > > > > > > > > > >> > > > > but we may end up with a set of abstractions
> that
> > > are
> > > > > > > > locally
> > > > > > > > > > >> usable
> > > > > > > > > > >> > but
> > > > > > > > > > >> > > > > globally inconsistent.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > On RpcOperator, I agree that we need to be
> very
> > > > > careful
> > > > > > in
> > > > > > > > > > defining
> > > > > > > > > > >> > the
> > > > > > > > > > >> > > > > boundary between the Flink runtime and
> external
> > > > > > > > orchestration
> > > > > > > > > > >> > systems.
> > > > > > > > > > >> > > > > Kubernetes or the Kubernetes Operator may
> well be
> > > the
> > > > > > > right
> > > > > > > > > > choice
> > > > > > > > > > >> > at the
> > > > > > > > > > >> > > > > physical deployment level. But I still believe
> > > Flink
> > > > > > > needs a
> > > > > > > > > > >> > first-class
> > > > > > > > > > >> > > > > RpcOperator abstraction, because deployment is
> > > only
> > > > > part
> > > > > > > of
> > > > > > > > > the
> > > > > > > > > > >> > problem —
> > > > > > > > > > >> > > > > the harder part is its semantic integration
> with
> > > the
> > > > > > Flink
> > > > > > > > > job.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > If model inference is part of the logical data
> > > flow,
> > > > > > Flink
> > > > > > > > > > needs at
> > > > > > > > > > >> > > > minimum
> > > > > > > > > > >> > > > > to be aware of its service discovery,
> backpressure
> > > > > > > behavior,
> > > > > > > > > > >> failover
> > > > > > > > > > >> > > > > behavior, in-flight request draining, and
> scaling
> > > > > > > > > coordination.
> > > > > > > > > > If
> > > > > > > > > > >> > it is
> > > > > > > > > > >> > > > > hidden entirely behind an external black-box
> > > service,
> > > > > it
> > > > > > > is
> > > > > > > > > hard
> > > > > > > > > > >> for
> > > > > > > > > > >> > > > Flink
> > > > > > > > > > >> > > > > to provide consistent job-level semantics and
> > > > > > operational
> > > > > > > > > > >> experience.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > So the point of RpcOperator is not necessarily
> > > that
> > > > > > "every
> > > > > > > > > > physical
> > > > > > > > > > >> > > > process
> > > > > > > > > > >> > > > > must be directly launched and managed by Flink
> > > core,"
> > > > > > but
> > > > > > > > that
> > > > > > > > > > >> Flink
> > > > > > > > > > >> > > > needs
> > > > > > > > > > >> > > > > to define a service-style operator contract
> that
> > > > > allows
> > > > > > > such
> > > > > > > > > > >> > operators to
> > > > > > > > > > >> > > > > be invoked correctly by the data flow,
> coordinated
> > > > > > > correctly
> > > > > > > > > by
> > > > > > > > > > the
> > > > > > > > > > >> > > > > runtime, and understood and operated by users
> as
> > > part
> > > > > > of a
> > > > > > > > > Flink
> > > > > > > > > > >> job.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > On vectorized batch processing, I agree the
> > > long-term
> > > > > > > > > direction
> > > > > > > > > > >> > should
> > > > > > > > > > >> > > > not
> > > > > > > > > > >> > > > > stop at Python. Native columnar / vectorized
> > > execution
> > > > > > is
> > > > > > > an
> > > > > > > > > > >> > end-to-end
> > > > > > > > > > >> > > > > problem that touches connectors, formats, the
> type
> > > > > > system,
> > > > > > > > > > runtime,
> > > > > > > > > > >> > Java,
> > > > > > > > > > >> > > > > SQL, and Python. The current proposal starts
> from
> > > the
> > > > > > > > > > Java/Python
> > > > > > > > > > >> > > > boundary
> > > > > > > > > > >> > > > > because that is where the row/column
> conversion
> > > > > overhead
> > > > > > > is
> > > > > > > > > most
> > > > > > > > > > >> > visible.
> > > > > > > > > > >> > > > > End-to-end columnar execution on the Java and
> SQL
> > > side
> > > > > > > > > deserves
> > > > > > > > > > to
> > > > > > > > > > >> be
> > > > > > > > > > >> > > > > discussed further as a separate, larger FLIP.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > On multimodal types and SerDes complexity, I
> agree
> > > > > this
> > > > > > > > needs
> > > > > > > > > > to be
> > > > > > > > > > >> > > > handled
> > > > > > > > > > >> > > > > carefully. Making AI-related objects
> first-class
> > > does
> > > > > > not
> > > > > > > > > imply
> > > > > > > > > > >> that
> > > > > > > > > > >> > > > every
> > > > > > > > > > >> > > > > connector must immediately and fully support
> > > image,
> > > > > > video,
> > > > > > > > > > audio,
> > > > > > > > > > >> > tensor,
> > > > > > > > > > >> > > > > and so on. The concrete incremental path,
> fallback
> > > > > > > strategy,
> > > > > > > > > and
> > > > > > > > > > >> the
> > > > > > > > > > >> > > > > boundary between formats, connector API, and
> the
> > > type
> > > > > > > system
> > > > > > > > > > will
> > > > > > > > > > >> be
> > > > > > > > > > >> > > > > discussed further in the corresponding
> sub-FLIPs.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > Coming back to the core of the proposal: it
> is not
> > > > > about
> > > > > > > > > turning
> > > > > > > > > > >> > Flink
> > > > > > > > > > >> > > > into
> > > > > > > > > > >> > > > > an AI framework. It is about making the core
> > > objects
> > > > > and
> > > > > > > > > > execution
> > > > > > > > > > >> > > > patterns
> > > > > > > > > > >> > > > > of AI-oriented data processing first-class
> > > citizens in
> > > > > > > > Flink.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > Best,
> > > > > > > > > > >> > > > > Guowei
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > On Thu, Apr 30, 2026 at 5:37 AM Yaroslav
> > > Tkachenko <
> > > > > > > > > > >> > [email protected]
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > wrote:
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > > Hi Guowei,
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > Thank you for writing this proposal.
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > I may be in the minority here, but I hope my
> > > voice
> > > > > > will
> > > > > > > be
> > > > > > > > > > >> heard. I
> > > > > > > > > > >> > > > > > disagree with turning Flink into an
> "AI-Native"
> > > > > > engine.
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > Regarding your "Data processing is entering
> the
> > > AI
> > > > > > era,
> > > > > > > > and
> > > > > > > > > > Flink
> > > > > > > > > > >> > > > needs to
> > > > > > > > > > >> > > > > > evolve from a traditional BI compute engine
> > > into a
> > > > > > data
> > > > > > > > > engine
> > > > > > > > > > >> that
> > > > > > > > > > >> > > > > > natively supports AI workloads" claim:
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > - How exactly do you define "AI"? I don't
> > > believe
> > > > > > there
> > > > > > > > is a
> > > > > > > > > > >> > standard
> > > > > > > > > > >> > > > > > definition. For example, Machine Learning
> have
> > > been
> > > > > > > around
> > > > > > > > > for
> > > > > > > > > > >> more
> > > > > > > > > > >> > > > than a
> > > > > > > > > > >> > > > > > decade, but there were no proposals (or
> need,
> > > in my
> > > > > > > > opinion)
> > > > > > > > > > to
> > > > > > > > > > >> > turn
> > > > > > > > > > >> > > > Flink
> > > > > > > > > > >> > > > > > into an "ML-Native" engine. Flink, in its
> > > current
> > > > > > state,
> > > > > > > > has
> > > > > > > > > > >> > > > > > been successfully used in many systems
> alongside
> > > > > > > dedicated
> > > > > > > > > ML
> > > > > > > > > > >> > > > technologies,
> > > > > > > > > > >> > > > > > like feature stores. Based on the context of
> > > your
> > > > > > > > proposal,
> > > > > > > > > it
> > > > > > > > > > >> > looks
> > > > > > > > > > >> > > > like
> > > > > > > > > > >> > > > > > you mostly mean LLMs, so could you be
> specific
> > > about
> > > > > > the
> > > > > > > > > > >> language?
> > > > > > > > > > >> > > > > > - I wouldn't call Flink "a traditional BI
> > > compute
> > > > > > > engine".
> > > > > > > > > > Flink
> > > > > > > > > > >> > is a
> > > > > > > > > > >> > > > > > general data processing technology which
> can be
> > > used
> > > > > > > for a
> > > > > > > > > > >> variety
> > > > > > > > > > >> > of
> > > > > > > > > > >> > > > use
> > > > > > > > > > >> > > > > > cases without any BI involvement.
> > > > > > > > > > >> > > > > > - Do you have any proof that "Users' core
> > > workloads
> > > > > > are
> > > > > > > > > > rapidly
> > > > > > > > > > >> > > > evolving"
> > > > > > > > > > >> > > > > > and that they require your proposed changes?
> > > Case
> > > > > > > studies,
> > > > > > > > > > user
> > > > > > > > > > >> > > > surveys, or
> > > > > > > > > > >> > > > > > submitted issues about the lack of support?
> Big
> > > > > > changes
> > > > > > > > like
> > > > > > > > > > that
> > > > > > > > > > >> > > > require
> > > > > > > > > > >> > > > > > extensive validation.
> > > > > > > > > > >> > > > > > - And even if there is a real need to adopt
> some
> > > > > > > > LLM-driven
> > > > > > > > > > >> > changes,
> > > > > > > > > > >> > > > why
> > > > > > > > > > >> > > > > > now? The LLM-related tooling has been
> changing
> > > so
> > > > > > > rapidly,
> > > > > > > > > and
> > > > > > > > > > >> it's
> > > > > > > > > > >> > > > hard to
> > > > > > > > > > >> > > > > > predict what will be needed tomorrow. Why
> does
> > > it
> > > > > make
> > > > > > > > sense
> > > > > > > > > > to
> > > > > > > > > > >> > > > introduce
> > > > > > > > > > >> > > > > > changes now, and not wait for more
> > > standardization
> > > > > and
> > > > > > > > > > >> > consolidation?
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > To summarize, I think there are a lot of
> great
> > > ideas
> > > > > > in
> > > > > > > > the
> > > > > > > > > > >> > proposal,
> > > > > > > > > > >> > > > but
> > > > > > > > > > >> > > > > > in my mind, they need to be addressed as
> > > tactical,
> > > > > > > focused
> > > > > > > > > > >> > changes, not
> > > > > > > > > > >> > > > > > under the "AI-Native" umbrella.
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > I also wanted to address a few more specific
> > > points:
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > - RpcOperator, why does it need to be
> managed by
> > > > > > Flink?
> > > > > > > I
> > > > > > > > > see
> > > > > > > > > > >> > > > absolutely no
> > > > > > > > > > >> > > > > > need to introduce the additional complexity
> of
> > > > > > > > orchestrating
> > > > > > > > > > >> > standalone
> > > > > > > > > > >> > > > > > components into the core Flink engine. I can
> > > > > imagine a
> > > > > > > > > > separate
> > > > > > > > > > >> > > > sub-project
> > > > > > > > > > >> > > > > > for an RpcOperator, which could potentially
> be
> > > > > managed
> > > > > > > by
> > > > > > > > > the
> > > > > > > > > > >> > > > Kubernetes
> > > > > > > > > > >> > > > > > Operator.
> > > > > > > > > > >> > > > > > - You make the case for the vectorized batch
> > > > > > processing,
> > > > > > > > but
> > > > > > > > > > only
> > > > > > > > > > >> > on
> > > > > > > > > > >> > > > the
> > > > > > > > > > >> > > > > > Python side. Why stop there? Native columnar
> > > > > > vectorized
> > > > > > > > > > execution
> > > > > > > > > > >> > will
> > > > > > > > > > >> > > > > > require end-to-end changes, including
> > > connectors,
> > > > > data
> > > > > > > > > format
> > > > > > > > > > >> > support,
> > > > > > > > > > >> > > > Type
> > > > > > > > > > >> > > > > > system support, runtime changes, etc. It
> seems
> > > > > logical
> > > > > > > to
> > > > > > > > me
> > > > > > > > > > to
> > > > > > > > > > >> > support
> > > > > > > > > > >> > > > > > this execution mode for Java and SQL as
> well.
> > > > > > > > > > >> > > > > > - Supporting many more data types natively
> > > (images,
> > > > > > > video,
> > > > > > > > > > audio,
> > > > > > > > > > >> > > > tensors)
> > > > > > > > > > >> > > > > > will make connector serializers and
> > > deserializers
> > > > > > > (SerDes)
> > > > > > > > > > much
> > > > > > > > > > >> > more
> > > > > > > > > > >> > > > > > challenging to implement. Even today, many
> > > SerDes in
> > > > > > > > > > officially
> > > > > > > > > > >> > > > supported
> > > > > > > > > > >> > > > > > connectors don't fully implement types like
> > > arrays
> > > > > and
> > > > > > > > > > structs.
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > Thank you.
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > On Wed, Apr 29, 2026 at 1:18 AM Guowei Ma <
> > > > > > > > > > [email protected]>
> > > > > > > > > > >> > > > wrote:
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > > Hi Z
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > Thanks for the kind words and the
> thoughtful
> > > > > > > questions.
> > > > > > > > > Let
> > > > > > > > > > me
> > > > > > > > > > >> > take
> > > > > > > > > > >> > > > them
> > > > > > > > > > >> > > > > > > one by one.
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > >    1. Throughput and latency targets
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > To be honest, I don't have concrete
> numbers to
> > > > > share
> > > > > > > > yet.
> > > > > > > > > > What
> > > > > > > > > > >> I
> > > > > > > > > > >> > can
> > > > > > > > > > >> > > > say
> > > > > > > > > > >> > > > > > is
> > > > > > > > > > >> > > > > > > that our internal testing has already
> surfaced
> > > > > > several
> > > > > > > > > > >> directions
> > > > > > > > > > >> > > > where
> > > > > > > > > > >> > > > > > > Flink can be improved, and at the same
> time we
> > > > > want
> > > > > > to
> > > > > > > > > fully
> > > > > > > > > > >> > leverage
> > > > > > > > > > >> > > > > > > Flink's existing streaming shuffle
> > > capabilities.
> > > > > As
> > > > > > > the
> > > > > > > > > > >> > multimodal
> > > > > > > > > > >> > > > > > operator
> > > > > > > > > > >> > > > > > > library matures, we'll progressively
> publish
> > > > > > benchmark
> > > > > > > > > > results.
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > >    2. Built-in operators
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > You're absolutely right. From what I've
> seen,
> > > our
> > > > > > > > internal
> > > > > > > > > > >> users
> > > > > > > > > > >> > > > already
> > > > > > > > > > >> > > > > > > rely on a fairly large set of multimodal
> > > > > operators —
> > > > > > > > > > >> potentially
> > > > > > > > > > >> > > > 100+.
> > > > > > > > > > >> > > > > > The
> > > > > > > > > > >> > > > > > > exact set the community should provide is
> best
> > > > > > > discussed
> > > > > > > > > in
> > > > > > > > > > >> > FLIP-XXX:
> > > > > > > > > > >> > > > > > > Built-in Multimodal Operators and AI
> > > Functions,
> > > > > and
> > > > > > > > > > >> contributions
> > > > > > > > > > >> > > > from
> > > > > > > > > > >> > > > > > the
> > > > > > > > > > >> > > > > > > community are very welcome there.
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > >    3. Plan for the 11 sub-FLIPs
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > The sequencing follows the layering in the
> > > > > umbrella:
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > >    - Layer 1 (Core Primitives) should be
> > > discussed
> > > > > > and
> > > > > > > > > > aligned
> > > > > > > > > > >> > first,
> > > > > > > > > > >> > > > > > since
> > > > > > > > > > >> > > > > > >    the second and third layers build on
> it.
> > > > > > > > > > >> > > > > > >    - Layer 2 (API + compilation +
> single-node
> > > > > > > execution)
> > > > > > > > > > starts
> > > > > > > > > > >> > with
> > > > > > > > > > >> > > > > > >    getting the API discussion right — the
> > > Python
> > > > > > API,
> > > > > > > > how
> > > > > > > > > > UDFs
> > > > > > > > > > >> > > > declare
> > > > > > > > > > >> > > > > > >    resources, etc. — after which the
> > > single-node
> > > > > > > > execution
> > > > > > > > > > work
> > > > > > > > > > >> > can
> > > > > > > > > > >> > > > build
> > > > > > > > > > >> > > > > > > on
> > > > > > > > > > >> > > > > > >    top.
> > > > > > > > > > >> > > > > > >    - Layer 3 (distributed scheduling and
> > > > > > > checkpointing)
> > > > > > > > > can
> > > > > > > > > > >> > largely
> > > > > > > > > > >> > > > > > proceed
> > > > > > > > > > >> > > > > > >    independently in parallel.
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > So while each sub-FLIP is indeed a
> substantial
> > > > > piece
> > > > > > > of
> > > > > > > > > > work,
> > > > > > > > > > >> > most of
> > > > > > > > > > >> > > > > > them
> > > > > > > > > > >> > > > > > > can be advanced in parallel by different
> > > > > > contributors
> > > > > > > > once
> > > > > > > > > > the
> > > > > > > > > > >> > Layer
> > > > > > > > > > >> > > > 1
> > > > > > > > > > >> > > > > > > primitives are settled.
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > >    4. GPU scheduling roadmap
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > Could you expand a bit on which aspect of
> GPU
> > > > > > > scheduling
> > > > > > > > > you
> > > > > > > > > > >> > have in
> > > > > > > > > > >> > > > mind
> > > > > > > > > > >> > > > > > > as the complex one? "GPU scheduling"
> covers a
> > > > > fairly
> > > > > > > > wide
> > > > > > > > > > >> surface
> > > > > > > > > > >> > > > area
> > > > > > > > > > >> > > > > > > (resource declaration, operator-level
> > > deployment,
> > > > > > > > elastic
> > > > > > > > > > >> > scaling,
> > > > > > > > > > >> > > > > > > heterogeneous GPU types, fine-grained
> > > > > partitioning,
> > > > > > > > etc.),
> > > > > > > > > > and
> > > > > > > > > > >> > the
> > > > > > > > > > >> > > > answer
> > > > > > > > > > >> > > > > > > differs quite a bit depending on which
> > > dimension
> > > > > > we're
> > > > > > > > > > >> > discussing.
> > > > > > > > > > >> > > > Once I
> > > > > > > > > > >> > > > > > > understand your specific concern I can
> give a
> > > more
> > > > > > > > useful
> > > > > > > > > > >> > response.
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > Thanks again for the support — looking
> > > forward to
> > > > > > the
> > > > > > > > > > continued
> > > > > > > > > > >> > > > > > discussion.
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > Best,
> > > > > > > > > > >> > > > > > > Guowei
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > On Tue, Apr 28, 2026 at 4:34 PM zl z <
> > > > > > > > > [email protected]
> > > > > > > > > > >
> > > > > > > > > > >> > wrote:
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > > Hey Guowei,
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > > Thanks for the proposal, and I think
> this is
> > > > > very
> > > > > > > > > > valuable. I
> > > > > > > > > > >> > have
> > > > > > > > > > >> > > > some
> > > > > > > > > > >> > > > > > > > question about it:
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > > 1. What are our expected throughput and
> > > latency
> > > > > > > > targets?
> > > > > > > > > > Do
> > > > > > > > > > >> we
> > > > > > > > > > >> > > > have any
> > > > > > > > > > >> > > > > > > > forward-looking tests for this?
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > > 2. AI involves a very large number of
> > > operators.
> > > > > > > > Besides
> > > > > > > > > > >> > allowing
> > > > > > > > > > >> > > > users
> > > > > > > > > > >> > > > > > > to
> > > > > > > > > > >> > > > > > > > use them through UDFs, will we also
> provide
> > > > > > commonly
> > > > > > > > > used
> > > > > > > > > > >> > built-in
> > > > > > > > > > >> > > > > > > > operators?
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > > 3. Each of the 11 sub-FLIPs is a major
> > > project
> > > > > > > > > involving a
> > > > > > > > > > >> > > > significant
> > > > > > > > > > >> > > > > > > > amount of changes. What is our plan for
> > > this?
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > > 4. GPU scheduling is extremely complex.
> > > What is
> > > > > > our
> > > > > > > > > > current
> > > > > > > > > > >> > > > roadmap for
> > > > > > > > > > >> > > > > > > > this?
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > > This is a very high-quality and exciting
> > > > > proposal.
> > > > > > > > > Making
> > > > > > > > > > >> > Flink an
> > > > > > > > > > >> > > > > > > > AI-native data processing engine will
> make
> > > it
> > > > > far
> > > > > > > more
> > > > > > > > > > >> > valuable in
> > > > > > > > > > >> > > > the
> > > > > > > > > > >> > > > > > AI
> > > > > > > > > > >> > > > > > > > era. Look forward to seeing it land and
> > > come to
> > > > > > > > fruition
> > > > > > > > > > >> soon.
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > > Robert Metzger <[email protected]>
> > > > > > 于2026年4月28日周二
> > > > > > > > > > 14:38写道:
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > > > Hey Guowei,
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > Thanks for the proposal. I just took a
> > > brief
> > > > > > look,
> > > > > > > > > here
> > > > > > > > > > are
> > > > > > > > > > >> > some
> > > > > > > > > > >> > > > high
> > > > > > > > > > >> > > > > > > > level
> > > > > > > > > > >> > > > > > > > > questions:
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > Regarding the RPC Operator: What is
> the
> > > > > > difference
> > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > >> > async
> > > > > > > > > > >> > > > io
> > > > > > > > > > >> > > > > > > > operator
> > > > > > > > > > >> > > > > > > > > we have already?
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > "Connector API for Multimodal Data
> > > > > Source/Sink":
> > > > > > > Why
> > > > > > > > > do
> > > > > > > > > > we
> > > > > > > > > > >> > need
> > > > > > > > > > >> > > > to
> > > > > > > > > > >> > > > > > > touch
> > > > > > > > > > >> > > > > > > > > the connector API for supporting
> > > multimodal
> > > > > > data?
> > > > > > > > > Isn't
> > > > > > > > > > >> this
> > > > > > > > > > >> > > > more of
> > > > > > > > > > >> > > > > > a
> > > > > > > > > > >> > > > > > > > > formats concern?
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > "Non-Disruptive Scaling for CPU
> > > Operators":
> > > > > How
> > > > > > do
> > > > > > > > you
> > > > > > > > > > want
> > > > > > > > > > >> > to
> > > > > > > > > > >> > > > > > > guarantee
> > > > > > > > > > >> > > > > > > > > exactly-once on that kind of scaling?
> > > E.g. you
> > > > > > > need
> > > > > > > > to
> > > > > > > > > > >> > somehow
> > > > > > > > > > >> > > > make a
> > > > > > > > > > >> > > > > > > > > handover between the old and new new
> > > pipeline
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > Overall, I find the proposal has some
> > > things
> > > > > > which
> > > > > > > > > seem
> > > > > > > > > > >> > related
> > > > > > > > > > >> > > > to
> > > > > > > > > > >> > > > > > > making
> > > > > > > > > > >> > > > > > > > > Flink more AI native, but other
> changes
> > > seem
> > > > > > > > > orthogonal
> > > > > > > > > > to
> > > > > > > > > > >> > that.
> > > > > > > > > > >> > > > For
> > > > > > > > > > >> > > > > > > > > example the checkpoint or scaling
> changes
> > > are
> > > > > > > > actually
> > > > > > > > > > >> > unrelated
> > > > > > > > > > >> > > > to
> > > > > > > > > > >> > > > > > AI,
> > > > > > > > > > >> > > > > > > > and
> > > > > > > > > > >> > > > > > > > > just engine improvements.
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > On Tue, Apr 28, 2026 at 5:48 AM
> Guowei Ma
> > > <
> > > > > > > > > > >> > [email protected]>
> > > > > > > > > > >> > > > > > > wrote:
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > Hi everyone,
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > I'd like to start a discussion on an
> > > > > umbrella
> > > > > > > > > FLIP[1]
> > > > > > > > > > >> that
> > > > > > > > > > >> > lays
> > > > > > > > > > >> > > > > > out a
> > > > > > > > > > >> > > > > > > > > > direction for evolving Flink into a
> data
> > > > > > engine
> > > > > > > > that
> > > > > > > > > > >> > natively
> > > > > > > > > > >> > > > > > > supports
> > > > > > > > > > >> > > > > > > > AI
> > > > > > > > > > >> > > > > > > > > > workloads.
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > The short version: user workloads
> are
> > > > > shifting
> > > > > > > > from
> > > > > > > > > BI
> > > > > > > > > > >> > > > analytics to
> > > > > > > > > > >> > > > > > > > > > multimodal data processing centered
> on
> > > model
> > > > > > > > > > inference,
> > > > > > > > > > >> and
> > > > > > > > > > >> > > > this
> > > > > > > > > > >> > > > > > > > triggers
> > > > > > > > > > >> > > > > > > > > > cascading changes across the stack —
> > > > > > multimodal
> > > > > > > > data
> > > > > > > > > > >> > flowing
> > > > > > > > > > >> > > > > > through
> > > > > > > > > > >> > > > > > > > > > pipelines, heterogeneous CPU/GPU
> > > resources,
> > > > > > > > > vectorized
> > > > > > > > > > >> > > > execution,
> > > > > > > > > > >> > > > > > and
> > > > > > > > > > >> > > > > > > > > > inference tasks that run for
> seconds to
> > > > > > minutes
> > > > > > > on
> > > > > > > > > > Spot
> > > > > > > > > > >> > > > instances.
> > > > > > > > > > >> > > > > > > The
> > > > > > > > > > >> > > > > > > > > > proposal sketches an evolution along
> > > five
> > > > > > > > directions
> > > > > > > > > > >> > > > (development
> > > > > > > > > > >> > > > > > > > > paradigm,
> > > > > > > > > > >> > > > > > > > > > data model, heterogeneous resources,
> > > > > execution
> > > > > > > > > engine,
> > > > > > > > > > >> > fault
> > > > > > > > > > >> > > > > > > > tolerance),
> > > > > > > > > > >> > > > > > > > > > decomposed into 11 sub-FLIPs
> organized
> > > into
> > > > > > > three
> > > > > > > > > > layers:
> > > > > > > > > > >> > core
> > > > > > > > > > >> > > > > > > runtime
> > > > > > > > > > >> > > > > > > > > > primitives, AI workload expression
> and
> > > > > > > execution,
> > > > > > > > > and
> > > > > > > > > > >> > > > > > > production-grade
> > > > > > > > > > >> > > > > > > > > > operational guarantees. Most
> sub-FLIPs
> > > have
> > > > > no
> > > > > > > > hard
> > > > > > > > > > >> > > > dependencies on
> > > > > > > > > > >> > > > > > > > each
> > > > > > > > > > >> > > > > > > > > > other and can be advanced in
> parallel.
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > A note on scope, since it's an
> umbrella:
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > - In scope here: whether the
> evolution
> > > > > > > directions
> > > > > > > > > are
> > > > > > > > > > >> > > > reasonable,
> > > > > > > > > > >> > > > > > > > whether
> > > > > > > > > > >> > > > > > > > > > each sub-FLIP's motivation and
> proposed
> > > > > > approach
> > > > > > > > are
> > > > > > > > > > >> > > > well-founded,
> > > > > > > > > > >> > > > > > > and
> > > > > > > > > > >> > > > > > > > > > whether the boundaries and
> dependencies
> > > > > > between
> > > > > > > > > > sub-FLIPs
> > > > > > > > > > >> > are
> > > > > > > > > > >> > > > > > clear.
> > > > > > > > > > >> > > > > > > > > > - Out of scope here: detailed
> designs,
> > > API
> > > > > > > > > specifics,
> > > > > > > > > > and
> > > > > > > > > > >> > > > > > > > implementation
> > > > > > > > > > >> > > > > > > > > > plans of individual sub-FLIPs —
> those
> > > will
> > > > > go
> > > > > > > > > through
> > > > > > > > > > >> > their own
> > > > > > > > > > >> > > > > > > FLIPs.
> > > > > > > > > > >> > > > > > > > > > - Consensus criteria: agreement on
> the
> > > > > overall
> > > > > > > > > > direction
> > > > > > > > > > >> is
> > > > > > > > > > >> > > > > > > sufficient
> > > > > > > > > > >> > > > > > > > > for
> > > > > > > > > > >> > > > > > > > > > the umbrella to pass; passing it
> does
> > > not
> > > > > lock
> > > > > > > in
> > > > > > > > > any
> > > > > > > > > > >> > > > sub-FLIP's
> > > > > > > > > > >> > > > > > > > design —
> > > > > > > > > > >> > > > > > > > > > sub-FLIPs may still be adjusted,
> > > deferred,
> > > > > or
> > > > > > > > > > withdrawn
> > > > > > > > > > >> as
> > > > > > > > > > >> > they
> > > > > > > > > > >> > > > > > > > progress.
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > All proposed changes are
> incremental —
> > > no
> > > > > > > existing
> > > > > > > > > > API or
> > > > > > > > > > >> > > > behavior
> > > > > > > > > > >> > > > > > is
> > > > > > > > > > >> > > > > > > > > > removed or altered. Compatibility
> > > details
> > > > > are
> > > > > > > > > covered
> > > > > > > > > > at
> > > > > > > > > > >> > the
> > > > > > > > > > >> > > > end of
> > > > > > > > > > >> > > > > > > the
> > > > > > > > > > >> > > > > > > > > > document.
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > Looking forward to your feedback on
> the
> > > > > > overall
> > > > > > > > > > direction
> > > > > > > > > > >> > and
> > > > > > > > > > >> > > > the
> > > > > > > > > > >> > > > > > > > > layering.
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > [1]
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > Thanks,
> > > > > > > > > > >> > > > > > > > > > Guowei
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>

Reply via email to