Many thanks to Guowei for driving this effort & +1 to this proposal. Currently, multimodal processing is a hot direction and topic, and many new technical requirements have emerged around it. This proposal can significantly expand Flink's scope in real-time multimodal processing -- a highly meaningful initiative.
Furthermore, many of the architectural improvements proposed here (such as the RPC operator) also serve as excellent extensions to Flink's current architectural capabilities. For example, `Remote Compaction` and `Sharable State` can both be built upon this, making it a very fundamental extension. I am very interested in the PRC OP and look forward to the detailed discussions regarding this FLIP. Best Yuan On Thu, May 28, 2026 at 9:18 AM Guowei Ma <[email protected]> wrote: > Hi Martijn, > > Thanks a lot for the thoughtful follow-up, and no worries about the timing. > > Your summary captures the intended framing very well: the umbrella is meant > to align the community on the workload class, the overall direction, and > the relationship between the sub-FLIPs, while each concrete design still > needs to be discussed and evaluated on its own merits. > > I also agree that the feature matrix concern should remain visible during > the sub-FLIP discussions. For engine-level primitives, we should continue > to be explicit about the intended API coverage, staged rollout, and any > limitations, rather than leaving those implicit. > > Thanks again for the careful review. Given the discussion so far, I will > proceed with the vote soon. > > Best, > Guowei > > On Wed, May 27, 2026 at 8:25 PM Martijn Visser <[email protected]> > wrote: > > > Hi Guowei, > > > > Apologies for the late reply. The replies and revisions address > > several of my original concerns. The explicit non-goals, the "Why Now > > / Why Flink" sections, and the conservative defaults on runtime > > mechanisms (opt-in, no behavioral change for existing jobs) all land > > in the right place. > > > > On the umbrella's purpose: I now read it as a directional statement > > that gives the community a shared view on the workload class and how > > the sub-FLIPs relate, with each sub-FLIP discussed and voted on its > > own merits. With that framing, a +1 on the umbrella is a +1 on > > direction, not on any specific design choice. The substantive scrutiny > > lives at sub-FLIP review time, which is where I'd want it anyway. > > > > Brief notes on my four original points: > > > > 1. The "bundling" concern softens once it's clear the umbrella is > > directional. The engine-shaped capabilities (regional checkpoints, > > UAC, non-disruptive scaling) and the AI-shaped capabilities will still > > need to stand on their own at sub-FLIP review. > > 2. The feature matrix concern is still something that I see as > > concerning, but we can have those discussions for each of the > > sub-FLIPs where applicable. I'm happy to see the signals from Jark, > > Dian and Lincoln that there's commitment on that already. > > 3. The Flink ML risk question is addressed in my opinion. > > 4. The evidence question has had the input from others in the > > community, and the updated "Why Now" section. I think we also need to > > be able to experiment and see what's working on this front, but that > > shouldn't be a reason to hold back on the overall idea. > > > > To summarize, I'm comfortable with the vote proceeding. Apologies for > > the delay on my side. > > > > Best regards, > > > > Martijn > > > > Op do 21 mei 2026 om 11:19 schreef Guowei Ma <[email protected]>: > > > > > > Hi Martijn > > > > > > Thanks for the heads-up. > > > > > > That sounds good to me. Since the proposal has been stable for a while, > > > could > > > you share a rough idea of when you expect to finish your final review? > A > > > tentative date would be enough, and I can hold off starting the vote > > until > > > then. > > > > > > Best, > > > Guowei > > > > > > > > > On Thu, May 21, 2026 at 4:47 PM Martijn Visser < > [email protected] > > > > > > wrote: > > > > > > > Hi Guowei, > > > > > > > > I would like to have a couple of more days to take a closer final > look > > > > with a final round of comments, I haven't been able to finalize > > > > everything yet on this subject. > > > > > > > > Best regards, > > > > > > > > Martijn > > > > > > > > Op wo 20 mei 2026 om 15:04 schreef Guowei Ma <[email protected]>: > > > > > > > > > > Hi all, > > > > > > > > > > Thanks everyone for the discussion and feedback on the FLIP. > > > > > > > > > > If there are no further comments or concerns, I plan to start the > > vote on > > > > > Thursday. > > > > > > > > > > > > > > > Best, > > > > > Guowei > > > > > > > > > > > > > > > On Fri, May 15, 2026 at 12:21 PM Jiangang Liu < > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > > I Greatly support this proposal. AI-oriented and multimodal data > > > > processing > > > > > > is already becoming a real production workload, and Flink is well > > > > > > positioned to serve as the dataflow backbone for these pipelines. > > > > > > > > > > > > The proposal correctly focuses on engine-level capabilities > rather > > than > > > > > > turning Flink into an AI framework or model serving platform. In > > > > > > particular, the shared runtime contract across multimodal types, > > object > > > > > > references, inference invocation, GPU resource management, > > scaling, and > > > > > > recovery is essential. Without such coordination, users would > have > > to > > > > > > stitch together fragmented abstractions themselves. I also > > appreciate > > > > the > > > > > > incremental and backward-compatible approach, as well as the > > > > conservative > > > > > > scope for correctness-sensitive runtime changes. > > > > > > > > > > > > This direction can help Flink extend its existing strengths in > > > > streaming, > > > > > > checkpointing, and production operations to a new class of > > AI-driven > > > > > > workloads. > > > > > > > > > > > > Best > > > > > > Jiangang Liu > > > > > > > > > > > > > > > > > > Lincoln Lee <[email protected]> 于2026年5月11日周一 14:47写道: > > > > > > > > > > > > > Hi Guowei and all, > > > > > > > > > > > > > > Big +1 on the overall direction of FLIP-577. > > > > > > > > > > > > > > From the SQL ecosystem perspective, over the past year the > > community > > > > > > > has introduced AI functions (e.g., ML_PREDICT, VECTOR_SEARCH) > to > > help > > > > > > > users leverage Flink in AI scenarios. Through community > > interactions > > > > > > > and enterprise user feedback since then, we can strongly sense > > the > > > > > > > growing expectation for Flink to go further — broader > multimodal > > data > > > > > > > processing capabilities, improved runtime stability under AI > > > > > > > workloads, and more optimized execution for inference-heavy > > > > pipelines. > > > > > > > > > > > > > > This FLIP addresses exactly those gaps. In particular, I'm > > excited > > > > about: > > > > > > > > > > > > > > - Multimodal data type extensions — first-class types like > > > > > > > Tensor, Image, Embedding, and Audio in the type system will > > unlock > > > > > > > native SQL/Table expressions over multimodal data, rather > than > > > > > > > forcing users to work around opaque BYTES columns. > > > > > > > - Object reference mechanism — a critical optimization for > > > > > > > large-payload multimodal data. Passing multi-megabyte blobs > > through > > > > > > > the standard data path is a known pain point today; a > > > > > > > reference-based approach will significantly reduce > > serialization > > > > > > > overhead and memory pressure. > > > > > > > - Richer built-in AI functions — standardizing common > multimodal > > > > > > > operations (embedding generation, similarity search, model > > > > > > > inference) as engine-level functions will eliminate the > > repeated > > > > > > > ad-hoc UDF implementations we see across organizations today, > > and > > > > > > > open the door for engine-level optimizations like shared > model > > > > > > > handles and adaptive batching. > > > > > > > > > > > > > > As Jark mentioned, we'd like to participate in and contribute > to > > the > > > > > > > relevant sub-FLIPs. > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > Lincoln Lee > > > > > > > > > > > > > > > > > > > > > Guowei Ma <[email protected]> 于2026年5月10日周日 21:26写道: > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > Thanks again for the thoughtful feedback and the valuable > > > > perspectives > > > > > > > > shared in this discussion. > > > > > > > > > > > > > > > > I have updated FLIP-577 [1] based on the discussion in this > > > > thread. The > > > > > > > > overall direction remains the same, but I have tried to make > > the > > > > scope, > > > > > > > > motivation, and boundaries clearer. > > > > > > > > > > > > > > > > The main changes are: > > > > > > > > > > > > > > > > 1. > > > > > > > > > > > > > > > > Clarified the target workload as AI-oriented data > processing > > > > > > > workloads, > > > > > > > > instead of relying only on the broader "AI-Native" > wording. > > > > > > > > 2. > > > > > > > > > > > > > > > > Added explicit non-goals to make clear that this proposal > > does > > > > not > > > > > > aim > > > > > > > > to turn Flink into an AI framework, ML platform, or model > > > > serving > > > > > > > > system. > > > > > > > > 3. > > > > > > > > > > > > > > > > Added "Why Now" and "Why Flink" sections to better explain > > the > > > > > > > > production signals, ecosystem trends, and why Flink's > > existing > > > > > > runtime > > > > > > > > strengths are relevant here. > > > > > > > > 4. > > > > > > > > > > > > > > > > Reworked the umbrella rationale. The key point is not that > > every > > > > > > > > sub-FLIP is AI-specific, but that these mechanisms need a > > shared > > > > > > > runtime > > > > > > > > contract across data representation, service invocation, > GPU > > > > > > > resources, > > > > > > > > scaling, and recovery. > > > > > > > > 5. > > > > > > > > > > > > > > > > Clarified that engine-level primitives should consider > > > > SQL/Table, > > > > > > Java > > > > > > > > DataStream, and Python DataFrame. > > > > > > > > 6. > > > > > > > > > > > > > > > > Made the initial correctness scope of runtime mechanisms — > > > > > > > > non-disruptive scaling, UAC enhancements, and Pipeline > > Region > > > > > > > > checkpoints — > > > > > > > > more conservative, with explicit opt-in where default > > behavior > > > > is > > > > > > > > affected. > > > > > > > > > > > > > > > > I also tried to reflect the earlier questions raised in this > > > > thread in > > > > > > > the > > > > > > > > corresponding sections of the updated document. > > > > > > > > > > > > > > > > Looking forward to the continued discussion. > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275 > > > > > > > > > > > > > > > > Best, > > > > > > > > Guowei > > > > > > > > > > > > > > > > > > > > > > > > On Sun, May 10, 2026 at 4:19 PM Xintong Song < > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > First, big thanks to Guowei for kicking off this important > > > > > > discussion, > > > > > > > > and > > > > > > > > > to the community for the substantive engagement over the > past > > > > days. > > > > > > I'd > > > > > > > > > also like to share some of my own thoughts. > > > > > > > > > > > > > > > > > > My apologies for joining late — I've been tied up with > other > > > > things > > > > > > and > > > > > > > > > only recently started catching up. Given the length of this > > > > thread, > > > > > > > it's > > > > > > > > > possible I haven't fully digested every detail. If I miss > > > > anything, > > > > > > > > please > > > > > > > > > bear with me and feel free to point it out. > > > > > > > > > > > > > > > > > > Let me state my position upfront: a big +1 to this FLIP. > > Based > > > > on the > > > > > > > > > workload evolution I've been observing, I believe moving > > Flink > > > > toward > > > > > > > > > AI-Native is necessary — this is a project- and > > community-level > > > > > > effort > > > > > > > > that > > > > > > > > > matters deeply for how Flink responds to the industry shift > > > > driven by > > > > > > > AI, > > > > > > > > > and how it captures the new opportunities that come with > it. > > We > > > > may > > > > > > > still > > > > > > > > > need more discussion to align on specific technical > details, > > but > > > > at > > > > > > the > > > > > > > > > strategic level of where the community should head, I > > strongly > > > > > > support > > > > > > > > this > > > > > > > > > umbrella. > > > > > > > > > > > > > > > > > > Below are my thoughts on a few key questions raised in the > > > > discussion > > > > > > > so > > > > > > > > > far. > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. On the scope of the umbrella > > > > > > > > > > > > > > > > > > Robert, Yaroslav, and Martijn have all pointed out that > some > > > > items in > > > > > > > the > > > > > > > > > FLIP — Pipeline Region independent checkpointing, UAC > > > > enhancements, > > > > > > > > > non-disruptive scaling, etc. — are not unique to AI > > scenarios and > > > > > > would > > > > > > > > > also apply elsewhere. That observation is fair. > > > > > > > > > > > > > > > > > > However, I'd argue the right criterion for whether > something > > > > belongs > > > > > > > > under > > > > > > > > > this umbrella is not whether it is *exclusively* used for > > AI, but > > > > > > > whether > > > > > > > > > it is *essential* to the central goal here — AI-Native > Flink > > — > > > > i.e., > > > > > > > > > whether AI-Native Flink would actually fall short without > it. > > > > > > > > > > > > > > > > > > From that angle, I agree with what Guowei has already laid > > out: > > > > under > > > > > > > AI > > > > > > > > > workload characteristics (per-record processing cost orders > > of > > > > > > > magnitude > > > > > > > > > higher, GPU resources expensive and scarce, long and > > long-tailed > > > > > > > > > asynchronous inference), these otherwise-general mechanisms > > get > > > > > > pushed > > > > > > > to > > > > > > > > > the limit of their current implementations. They may be > > adequate > > > > for > > > > > > > > > traditional BI/ETL workloads, but they become real > > bottlenecks > > > > under > > > > > > AI > > > > > > > > > workloads. Without them, the AI-Native Flink story would be > > > > > > incomplete. > > > > > > > > So > > > > > > > > > I support including them under this umbrella. > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. On usage and maintenance complexity > > > > > > > > > > > > > > > > > > I understand the concerns Yaroslav and Martijn raised about > > > > > > complexity > > > > > > > > > costs. That said, I don't think we should conservatively > > reject a > > > > > > > > > capability because of complexity, if that capability is > > critical > > > > and > > > > > > > > brings > > > > > > > > > substantial benefits. On the necessity of RpcOperator > > > > specifically, > > > > > > Zhu > > > > > > > > > Zhu, Guowei, and others have already provided answers > > earlier in > > > > this > > > > > > > > > thread, and I'll add some context from the Flink Agents > side > > > > below. > > > > > > > > > > > > > > > > > > As a side observation: with the rapid progress of > AI-assisted > > > > > > > development > > > > > > > > > over the past year or two, the way engineers work is > > shifting in > > > > a > > > > > > > > > noticeable way. On one hand, AI coding tools are helping > > > > developers > > > > > > > > > maintain complex software systems at lower effort. On the > > other > > > > hand, > > > > > > > > more > > > > > > > > > and more users are starting to ask AI how to use Flink > > > > correctly, and > > > > > > > > even > > > > > > > > > letting AI help them develop and operate Flink workloads. > > Usage > > > > and > > > > > > > > > maintenance complexity still matter, but their relative > > weight is > > > > > > > > arguably > > > > > > > > > going down — this is a trend worth factoring into long-term > > > > tradeoffs > > > > > > > at > > > > > > > > > the community level. > > > > > > > > > > > > > > > > > > > > > > > > > > > 3. On "wait for the industry to stabilize before > integrating" > > > > > > > > > > > > > > > > > > Yaroslav raised a concern about the timing — the LLM > tooling > > > > > > landscape > > > > > > > is > > > > > > > > > changing fast, and it's hard to predict what will be needed > > > > tomorrow. > > > > > > > I'd > > > > > > > > > offer a different perspective. > > > > > > > > > > > > > > > > > > It's *precisely because* the industry is evolving fast and > > > > hasn't yet > > > > > > > > > settled into a stable shape that Flink should join earlier > — > > to > > > > > > > influence > > > > > > > > > and help define how that shape forms. If we just stand by > > > > waiting for > > > > > > > the > > > > > > > > > landscape to settle, by the time it does, there may not be > a > > > > place > > > > > > for > > > > > > > > > Flink in it, and we'd have missed an important window. > > > > > > > > > > > > > > > > > > In fact, projects like Daft and Ray Data are increasingly > > > > defining > > > > > > the > > > > > > > de > > > > > > > > > facto standards in this space, as Leonard pointed out > > earlier in > > > > this > > > > > > > > > thread. If Flink doesn't engage proactively, even keeping > up > > > > later > > > > > > will > > > > > > > > put > > > > > > > > > us in a reactive position. > > > > > > > > > > > > > > > > > > Engaging early does carry some cost of trial and error — > some > > > > > > > > capabilities > > > > > > > > > may need to be deprecated or restructured down the line. > But > > I'd > > > > > > argue > > > > > > > > this > > > > > > > > > is a normal part of how open-source projects evolve, and > it's > > > > both > > > > > > > > > necessary and worth it. > > > > > > > > > > > > > > > > > > > > > > > > > > > 4. RpcOperator from a Flink Agents perspective > > > > > > > > > > > > > > > > > > The benefits of RpcOperator have already been articulated > by > > > > several > > > > > > > > people > > > > > > > > > earlier in the thread. I'd like to add a concrete scenario > > from > > > > the > > > > > > > Flink > > > > > > > > > Agents side — the subagent pattern. > > > > > > > > > > > > > > > > > > > > > > > > > > > 4.1 Brief intro to the subagent pattern > > > > > > > > > > > > > > > > > > In agent practice, after planning, the main agent often > > spawns > > > > one or > > > > > > > > more > > > > > > > > > subagents to handle specific subtasks. Each subagent has > its > > own > > > > LLM > > > > > > > > > context — its own system prompt, its own context window, > its > > own > > > > > > > toolset > > > > > > > > > and permissions. The main agent only receives the final > > result > > > > from > > > > > > the > > > > > > > > > subagent, not the intermediate steps. This pattern is now > > widely > > > > > > > adopted > > > > > > > > in > > > > > > > > > the industry, and many agent skills explicitly say "spawn a > > > > subagent > > > > > > to > > > > > > > > do > > > > > > > > > X". > > > > > > > > > > > > > > > > > > Flink Agents wants to support this pattern for several > > reasons: > > > > > > > > > compatibility with the existing agent skill ecosystem; > > context > > > > > > > isolation, > > > > > > > > > so that intermediate state during subagent execution > doesn't > > > > pollute > > > > > > > the > > > > > > > > > main agent's context; and one additional advantage that > > > > enterprise > > > > > > > > > production scenarios have over single-machine > > personal-assistant > > > > > > > agents — > > > > > > > > > the ability to dispatch resource-intensive subtasks to a > > shared > > > > > > > resource > > > > > > > > > pool for efficient resource reuse. > > > > > > > > > > > > > > > > > > > > > > > > > > > 4.2 Why this needs RpcOperator instead of in-place > execution > > > > inside > > > > > > the > > > > > > > > > Flink Agents operator > > > > > > > > > > > > > > > > > > Why not just spawn subagents directly inside the Flink > Agents > > > > > > operator? > > > > > > > > > Because subagent workloads are highly bursty — within the > > same > > > > job, > > > > > > one > > > > > > > > > minute's input might need several subagents for deep > > processing; > > > > the > > > > > > > next > > > > > > > > > minute's input might need none. Under that kind of > > randomness, > > > > > > > statically > > > > > > > > > allocating subagent resources per operator parallelism > > leaves no > > > > good > > > > > > > > > answer: under-allocate and the operator gets blown out the > > moment > > > > > > > demand > > > > > > > > > arrives; over-allocate and resources sit idle most of the > > time. > > > > > > > > > > > > > > > > > > The natural answer is to put subagent execution into a > shared > > > > > > resource > > > > > > > > pool > > > > > > > > > that can be used by multiple upstream operator subtasks, > and > > > > that can > > > > > > > > scale > > > > > > > > > elastically based on load. This maps directly onto the > > design and > > > > > > core > > > > > > > > > benefits of RpcOperator that Zhu Zhu laid out earlier — > each > > > > > > > RpcOperator > > > > > > > > > instance forming its own Pipelined Region, which in turn > > enables > > > > > > > > > independent scaling and flexible load balancing. > > > > > > > > > > > > > > > > > > > > > > > > > > > 4.3 Why this shared resource pool should be inside Flink > > rather > > > > than > > > > > > > > > deployed externally > > > > > > > > > > > > > > > > > > Two reasons. > > > > > > > > > > > > > > > > > > First, the prompts, toolsets, and permission > configurations a > > > > > > subagent > > > > > > > > > needs at execution time are essentially part of the Flink > > Agents > > > > *job > > > > > > > > > definition*. Stripping them out of the job and deploying > them > > > > > > > separately > > > > > > > > > effectively splits the job definition in two, leaving users > > to > > > > > > > maintain a > > > > > > > > > synchronization relationship between an external deployment > > and > > > > the > > > > > > > Flink > > > > > > > > > job. This goes against the developer experience Flink > Agents > > > > aims to > > > > > > > > > provide. > > > > > > > > > > > > > > > > > > Second, subagent execution needs to be incorporated into > > Flink's > > > > > > > > checkpoint > > > > > > > > > mechanism. The notion of "stateful" here deserves a brief > > > > > > > clarification: > > > > > > > > > each subagent task is self-contained — the full context it > > needs > > > > is > > > > > > > > handed > > > > > > > > > over by the main agent at dispatch time in one shot. That > is, > > > > > > subagents > > > > > > > > > don't need to share state across records, so at the task > > level > > > > they > > > > > > > look > > > > > > > > > stateless. However, executing a single subagent task takes > a > > > > > > > significant > > > > > > > > > amount of time and involves multiple rounds of model calls > > and > > > > tool > > > > > > > > calls, > > > > > > > > > where tool calls may have externally observable side > effects > > > > (sending > > > > > > > > > emails, writing to external systems, etc.). This means we > > need > > > > > > failure > > > > > > > > > recovery for the in-flight computation state during task > > > > execution, > > > > > > so > > > > > > > > that > > > > > > > > > subagent execution preserves exactly-once semantics across > > > > failovers. > > > > > > > > > > > > > > > > > > > > > > > > > > > To wrap up, let me reiterate my support for the overall > > > > direction of > > > > > > > this > > > > > > > > > FLIP. There are clearly technical details still to be > worked > > out, > > > > > > but I > > > > > > > > > believe the direction described by this umbrella is both > > > > necessary > > > > > > and > > > > > > > > > important for the long-term evolution of the Flink > community. > > > > Looking > > > > > > > > > forward to continued discussion in the sub-FLIPs ahead. > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > Xintong > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, May 8, 2026 at 3:43 PM Guowei Ma < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > Thanks everyone for the in-depth discussion over the past > > few > > > > days. > > > > > > > Let > > > > > > > > > me > > > > > > > > > > first summarize my understanding of the discussion so > far. > > I > > > > see > > > > > > > strong > > > > > > > > > > interest in making Flink better support AI-oriented data > > > > processing > > > > > > > > > > workloads, especially multimodal and inference-oriented > > > > pipelines. > > > > > > At > > > > > > > > the > > > > > > > > > > same time, a recurring concern is that RpcOperator and > > some of > > > > the > > > > > > > > Layer > > > > > > > > > 3 > > > > > > > > > > runtime improvements also look like general Flink > > > > capabilities, so > > > > > > > why > > > > > > > > > > should they be discussed together under the AI-Native / > > > > multimodal > > > > > > > > > > processing umbrella? > > > > > > > > > > > > > > > > > > > > I think the key question is not whether these mechanisms > > can > > > > only > > > > > > be > > > > > > > > used > > > > > > > > > > for AI scenarios, but whether they jointly form the > runtime > > > > > > contract > > > > > > > > > Flink > > > > > > > > > > needs to support AI-oriented data processing workloads, > > > > especially > > > > > > > > > > multimodal inference pipelines. Capabilities such as > > > > RpcOperator, > > > > > > > > > > checkpointing, scaling, and resource management can > > certainly > > > > serve > > > > > > > > > broader > > > > > > > > > > use cases. However, under this class of workloads, they > > require > > > > > > more > > > > > > > > > > consistent runtime semantics and coordinated design, and > > > > together > > > > > > > > > determine > > > > > > > > > > whether the system can provide production-grade > execution, > > > > > > recovery, > > > > > > > > and > > > > > > > > > > elasticity. > > > > > > > > > > > > > > > > > > > > Compared with traditional streaming workloads, this class > > of > > > > > > > workloads > > > > > > > > > > changes the data shape, computation pattern, and resource > > model > > > > > > > > > > significantly. Data is no longer only small structured > > > > records; it > > > > > > > may > > > > > > > > be > > > > > > > > > > images, video, audio, tensors, embeddings, or object > > > > references to > > > > > > > > > external > > > > > > > > > > large objects. Many pipelines also have a relatively > > > > shuffle-light > > > > > > > > shape, > > > > > > > > > > such as URI → preprocessing → inference → sink. The > > computation > > > > > > logic > > > > > > > > > often > > > > > > > > > > includes model inference or service-style invocation, > > either as > > > > > > > remote > > > > > > > > > > inference service invocation or local GPU / > > accelerator-backed > > > > > > > > execution. > > > > > > > > > > Therefore, the system needs to handle not only ordinary > RPC > > > > calls, > > > > > > > but > > > > > > > > > also > > > > > > > > > > service discovery, backpressure propagation, batching / > > > > concurrency > > > > > > > > > > control, timeout / retry, in-flight request draining, > model > > > > > > loading, > > > > > > > > GPU > > > > > > > > > > warmup, resource scheduling, and fault recovery. > > > > > > > > > > > > > > > > > > > > From this perspective, RpcOperator is not just “another > > way to > > > > call > > > > > > > an > > > > > > > > > > external service,” nor is it merely a deployment > mechanism > > for > > > > GPU > > > > > > > > > > operators. More importantly, it defines a service-style > > > > operator > > > > > > > > > > abstraction: when inference becomes part of the logical > > data > > > > flow, > > > > > > > > Flink > > > > > > > > > > needs to understand and coordinate these runtime > semantics, > > > > rather > > > > > > > than > > > > > > > > > > hiding inference completely behind external black-box > calls > > > > inside > > > > > > > user > > > > > > > > > > code. > > > > > > > > > > > > > > > > > > > > Some of the Layer 3 runtime improvements follow the same > > logic. > > > > > > While > > > > > > > > > > mechanisms such as checkpointing or scaling are not > > exclusive > > > > to AI > > > > > > > > > > workloads, inference-oriented workloads fundamentally > > change > > > > their > > > > > > > > > > operational assumptions and cost model, making runtime > > > > behavior far > > > > > > > > more > > > > > > > > > > critical than in traditional data processing systems. > When > > > > > > per-record > > > > > > > > > > computation is expensive, GPU warmup and model loading > are > > > > costly, > > > > > > > and > > > > > > > > > the > > > > > > > > > > execution environment may involve elastic / preemptible > > > > resources, > > > > > > > the > > > > > > > > > cost > > > > > > > > > > of global rollback or disruptive scaling becomes much > > higher > > > > than > > > > > > in > > > > > > > > > > traditional row-at-a-time BI / ETL workloads. As a > result, > > > > runtime > > > > > > > > > behavior > > > > > > > > > > that may have been acceptable for traditional workloads > can > > > > > > directly > > > > > > > > > affect > > > > > > > > > > stability and resource efficiency in inference-oriented > > > > workloads. > > > > > > > > > > > > > > > > > > > > At the same time, the umbrella proposal helps provide a > > shared > > > > > > > context > > > > > > > > > for > > > > > > > > > > discussing how these capabilities relate to each other > and > > what > > > > > > > common > > > > > > > > > > runtime assumptions they rely on. The more important > value > > of > > > > the > > > > > > > > > umbrella > > > > > > > > > > is to align on the workload model, design principles, > > > > boundaries, > > > > > > and > > > > > > > > > > dependencies between capabilities, so that independently > > > > evolving > > > > > > > > pieces > > > > > > > > > > such as RpcOperator, GPU resource declaration, batching / > > > > > > concurrency > > > > > > > > > > control, non-disruptive scaling, and regional > > checkpointing do > > > > not > > > > > > > end > > > > > > > > up > > > > > > > > > > with inconsistent runtime semantics. > > > > > > > > > > > > > > > > > > > > Based on this discussion, I will update the proposal to > > make > > > > the > > > > > > > > workload > > > > > > > > > > model, RpcOperator boundary, and Layer 3 dependency > > > > relationship > > > > > > > > clearer. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Guowei > > > > > > > > > > Best, > > > > > > > > > > Guowei > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, May 8, 2026 at 12:37 PM zhangjiaogg < > > > > [email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi Guowei and all, > > > > > > > > > > > > > > > > > > > > > > Thank you for driving this initiative. Strong +1 on the > > > > overall > > > > > > > > > > direction. > > > > > > > > > > > > > > > > > > > > > > From our perspective, the core value of this proposal > > lies > > > > in two > > > > > > > > > areas: > > > > > > > > > > > extending Flink's intelligent processing capabilities > for > > > > > > > multimodal > > > > > > > > > > data, > > > > > > > > > > > and enabling native, in-pipeline local inference. As AI > > > > > > > capabilities > > > > > > > > > > > continue to advance, multimodal data is accounting for > an > > > > > > > > increasingly > > > > > > > > > > > large share of overall data volume, and the ability to > > > > perform > > > > > > > > > > intelligent, > > > > > > > > > > > real-time processing on this data — not just ingestion > or > > > > > > routing, > > > > > > > > but > > > > > > > > > > > actual inference and transformation within the stream — > > is > > > > > > > becoming a > > > > > > > > > > > critical requirement across industries. Today, most > > pipelines > > > > > > treat > > > > > > > > > > > multimodal objects as opaque blobs and push inference > to > > > > external > > > > > > > > > > systems, > > > > > > > > > > > which works but at the cost of complexity, latency, and > > > > > > > consistency. > > > > > > > > > > > > > > > > > > > > > > This is exactly the pain one of our customers > > experiences in > > > > > > their > > > > > > > > > > > autonomous driving data pipelines. Video and image data > > > > captured > > > > > > by > > > > > > > > > > onboard > > > > > > > > > > > cameras must go through annotation, frame extraction, > > quality > > > > > > > > > filtering, > > > > > > > > > > > and both unstructured and structured data > transformation > > > > before > > > > > > it > > > > > > > > can > > > > > > > > > be > > > > > > > > > > > used for model training — a workflow that today > requires > > > > > > combining > > > > > > > > > > multiple > > > > > > > > > > > specialized systems: a stream engine for structured > > > > processing, a > > > > > > > > > > separate > > > > > > > > > > > framework (e.g., Ray Data) for multimodal processing, > > and an > > > > > > > external > > > > > > > > > > model > > > > > > > > > > > serving layer for inference. Each system boundary > > introduces > > > > > > > > > intermediate > > > > > > > > > > > storage, operational overhead, and data consistency > > > > challenges > > > > > > > across > > > > > > > > > the > > > > > > > > > > > full pipeline. If Flink can handle this end-to-end — > with > > > > native > > > > > > > > > > multimodal > > > > > > > > > > > types, local GPU inference operators, and unified > > > > checkpointing — > > > > > > > the > > > > > > > > > > > entire workflow becomes a single Flink job, > intermediate > > > > storage > > > > > > is > > > > > > > > > > > eliminated, and fault recovery covers the pipeline as a > > > > whole. > > > > > > > > > > > > > > > > > > > > > > We look forward to the sub-FLIP discussions and would > be > > > > happy to > > > > > > > > > > > contribute. > > > > > > > > > > > > > > > > > > > > > > Best regards > > > > > > > > > > > Jiao Zhang > > > > > > > > > > > > > > > > > > > > > > At 2026-05-07 12:58:15, "zihao chen" < > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > >Hi, all, > > > > > > > > > > > > > > > > > > > > > > > >I’d like to share some thoughts based on our internal > > > > experience > > > > > > > > > > > >with AI workloads on Flink. > > > > > > > > > > > > > > > > > > > > > > > >At Tencent, we have production scenarios where Flink > is > > > > used in > > > > > > > > > > > >AI-related pipelines. > > > > > > > > > > > > > > > > > > > > > > > >Based on these workloads, we explored elasticity and > > > > autoscaling > > > > > > > > > > > >for cloud-native stream processing systems and > > published our > > > > > > > > > > > >experience in SIGMOD 2025: > > > > > > > > > > > > > > > > > > > > > > > >"Oceanus: Enable SLO-Aware Vertical Autoscaling for > > > > > > > > > > > >Cloud-Native Streaming Services in Tencent" [1] > > > > > > > > > > > > > > > > > > > > > > > >As our workloads evolved, we also started to see > > increasing > > > > > > > > > > > >GPU-based training and inference scenarios. > > > > > > > > > > > > > > > > > > > > > > > >Our current solution integrates Flink with external > GPU > > > > > > services. > > > > > > > > > > > >While this works functionally, it also introduces > > several > > > > > > > practical > > > > > > > > > > > >issues, such as: > > > > > > > > > > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > > > > > > > > > > fragmented lifecycle management > > > > > > > > > > > > - > > > > > > > > > > > > > > > > > > > > > > > > operational complexity > > > > > > > > > > > > - > > > > > > > > > > > > > > > > > > > > > > > > inconsistent scaling/recovery behavior across > systems > > > > > > > > > > > > > > > > > > > > > > > >From this perspective, I think FLIP-577 is exploring a > > very > > > > > > > > > > > >meaningful direction. > > > > > > > > > > > > > > > > > > > > > > > >In particular, I agree with the idea of integrating > > > > GPU-backed > > > > > > > > > > > >computation more naturally into Flink’s runtime model, > > > > instead > > > > > > of > > > > > > > > > > > >treating it purely as an external service integration > > > > problem. > > > > > > > > > > > > > > > > > > > > > > > >Besides, from the elasticity perspective, our > > experience is > > > > that > > > > > > > > > > > >GPU workloads have very different characteristics > > compared > > > > with > > > > > > > > > > > >traditional CPU workloads: > > > > > > > > > > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > > > > > > > > > > GPU resources are expensive and scarce > > > > > > > > > > > > - > > > > > > > > > > > > > > > > > > > > > > > > Startup and replay costs are significantly higher > > > > > > > > > > > > - > > > > > > > > > > > > > > > > > > > > > > > > Long-running tasks make scaling and recovery more > > > > challenging > > > > > > > > > > > > > > > > > > > > > > > >In our experience, GPU elasticity cannot simply reuse > > the > > > > > > > > > > > >assumptions behind traditional CPU elasticity. > > > > > > > > > > > > > > > > > > > > > > > >Because of this, elasticity becomes especially > > important for > > > > > > > > > > > >production AI workloads, not only for resource > > efficiency, > > > > but > > > > > > > also > > > > > > > > > > > >for reducing scaling and recovery overhead. > > > > > > > > > > > > > > > > > > > > > > > >More broadly, AI workloads increasingly require Flink > to > > > > > > > collaborate > > > > > > > > > > > >more naturally with GPU-backed computation, and I > > believe > > > > > > > > > > > >FLIP-577 is exploring an important direction toward > > > > addressing > > > > > > > > > > > >this gap. > > > > > > > > > > > > > > > > > > > > > > > >Overall, I’m looking forward to further discussions > > about > > > > this > > > > > > > FLIP. > > > > > > > > > > > > > > > > > > > > > > > >[1] > https://dl.acm.org/doi/abs/10.1145/3722212.3724445 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >Best, > > > > > > > > > > > >Zihao Chen > > > > > > > > > > > > > > > > > > > > > > > >Yong Fang <[email protected]> 于2026年5月7日周四 11:11写道: > > > > > > > > > > > > > > > > > > > > > > > >> Hi devs, > > > > > > > > > > > >> > > > > > > > > > > > >> Thanks Guowei for initiating this proposal. I think > > this > > > > is an > > > > > > > > > > important > > > > > > > > > > > >> step for Flink towards the era of AI data > processing, > > > > very big > > > > > > > +1. > > > > > > > > > > > >> > > > > > > > > > > > >> I’d like to share some scenarios and requirements of > > > > > > leveraging > > > > > > > > > > PyFlink > > > > > > > > > > > for > > > > > > > > > > > >> AI data processing at ByteDance. Currently, we run > > tens of > > > > > > > > thousands > > > > > > > > > > of > > > > > > > > > > > >> pyflink/flink jobs, using millions of CPU cores. > > > > > > > > > > > >> > > > > > > > > > > > >> 1) Multimodal Data Processing > > > > > > > > > > > >> We want to use PyFlink to generate multimodal > feature > > > > data. > > > > > > The > > > > > > > > > > typical > > > > > > > > > > > >> workflow starts by reading ID-based raw data and > > > > performing > > > > > > > large > > > > > > > > > > table > > > > > > > > > > > >> joins and ETL computations. We then fetch multimodal > > > > assets > > > > > > such > > > > > > > > as > > > > > > > > > > > images, > > > > > > > > > > > >> videos, texts and audios from object storage by ID. > > These > > > > > > > > multimodal > > > > > > > > > > > data > > > > > > > > > > > >> are either sent to an RPC service (backed by local > > models > > > > or > > > > > > > > remote > > > > > > > > > > > large > > > > > > > > > > > >> models), or processed via local GPU computing for > > frame > > > > > > > > extraction, > > > > > > > > > > > >> embedding generation and other tasks. After > multimodal > > > > > > > > computation, > > > > > > > > > > > output > > > > > > > > > > > >> results including embeddings, processed images and > > > > multimodal > > > > > > > > > metadata > > > > > > > > > > > are > > > > > > > > > > > >> generated and persisted into the downstream > multimodal > > > > data > > > > > > > lake. > > > > > > > > > > > >> > > > > > > > > > > > >> 2) Stream-Batch Unified Data Training > > > > > > > > > > > >> We use PyFlink to consume processed sample data from > > MQ or > > > > > > data > > > > > > > > > lakes. > > > > > > > > > > > >> Within the data pipeline, data may be shuffled by > key, > > > > then > > > > > > fed > > > > > > > > into > > > > > > > > > > > >> parameter servers or local services for CPU-based or > > > > GPU-based > > > > > > > > model > > > > > > > > > > > >> training. Such workloads strongly demand optimized > > CPU & > > > > GPU > > > > > > > > hybrid > > > > > > > > > > > >> scheduling, worker node restart capability, fast > > scaling, > > > > as > > > > > > > well > > > > > > > > as > > > > > > > > > > > native > > > > > > > > > > > >> support for unified streaming and batch training. > > > > > > > > > > > >> > > > > > > > > > > > >> While supporting the above two categories of > > workloads, we > > > > > > have > > > > > > > > > > several > > > > > > > > > > > >> common requirements for PyFlink and Flink Core: > > > > > > > > > > > >> > > > > > > > > > > > >> 1) Native Python Computing Capability > > > > > > > > > > > >> We need more user-friendly DataFrame APIs, > > comprehensive > > > > > > > built-in > > > > > > > > > data > > > > > > > > > > > >> types for image and audio, as well as richer > > multimodal > > > > > > > computing > > > > > > > > > > > >> operators. This enables users to develop multimodal > > data > > > > > > > > processing > > > > > > > > > > jobs > > > > > > > > > > > >> more efficiently, and allows better optimization and > > > > > > scheduling > > > > > > > at > > > > > > > > > the > > > > > > > > > > > >> execution plan layer. > > > > > > > > > > > >> > > > > > > > > > > > >> 2) CPU & GPU Scheduling and Resource Management > > > > > > > > > > > >> It is expected to tag resource requirements at > > > > fine-grained > > > > > > > levels > > > > > > > > > > such > > > > > > > > > > > as > > > > > > > > > > > >> python user-defined functions. The scheduler should > > > > provide > > > > > > > > enhanced > > > > > > > > > > > >> resource orchestration and task scheduling, enabling > > more > > > > > > > flexible > > > > > > > > > > > >> heterogeneous resource management. > > > > > > > > > > > >> > > > > > > > > > > > >> 3) Embedded Server Node Capability in Pipeline > > > > > > > > > > > >> We hope to launch dedicated server nodes inside a > > single > > > > > > > pipeline, > > > > > > > > > to > > > > > > > > > > > load > > > > > > > > > > > >> local models or access remote models, which may be > > shared > > > > > > > between > > > > > > > > > > > different > > > > > > > > > > > >> operators in the pipeline. This unifies data ETL and > > > > > > multimodal > > > > > > > > > > > computing > > > > > > > > > > > >> within one end-to-end pipeline, greatly simplifying > > > > operation > > > > > > > and > > > > > > > > > > > >> maintenance for business teams. > > > > > > > > > > > >> > > > > > > > > > > > >> 4) Performance and Stability Optimization > > > > > > > > > > > >> Key enhancements include zero-copy data transfer > > between > > > > Flink > > > > > > > TM > > > > > > > > > > > processes > > > > > > > > > > > >> and Python processes, fast scaling of compute nodes, > > fast > > > > > > > > > > checkpointing > > > > > > > > > > > >> mechanism, and shuffle optimization for large-scale > > > > datasets. > > > > > > > > These > > > > > > > > > > > >> improvements will significantly boost the > performance > > and > > > > > > > > stability > > > > > > > > > of > > > > > > > > > > > >> PyFlink multimodal workloads. > > > > > > > > > > > >> > > > > > > > > > > > >> I’m really excited to see that FLIP-577 has covered > > all > > > > the > > > > > > real > > > > > > > > > > > production > > > > > > > > > > > >> scenarios and requirements mentioned above. It's a > > good > > > > chance > > > > > > > to > > > > > > > > > > > iterate > > > > > > > > > > > >> and enrich the core capabilities of PyFlink and > Flink > > Core > > > > > > > > targeting > > > > > > > > > > > these > > > > > > > > > > > >> AI data processing scenarios, and build Flink into a > > > > > > first-class > > > > > > > > AI > > > > > > > > > > data > > > > > > > > > > > >> processing engine. > > > > > > > > > > > >> > > > > > > > > > > > >> I’m very much looking forward to the progress. > > > > > > > > > > > >> > > > > > > > > > > > >> Best, > > > > > > > > > > > >> FangYong > > > > > > > > > > > >> > > > > > > > > > > > >> On Wed, May 6, 2026 at 8:36 PM FeatZhang < > > > > > > [email protected]> > > > > > > > > > > wrote: > > > > > > > > > > > >> > > > > > > > > > > > >> > Hi all, > > > > > > > > > > > >> > > > > > > > > > > > > >> > This is a great topic, and honestly long overdue. > > > > > > > > > > > >> > > > > > > > > > > > > >> > With the rapid growth of AI applications, we have > > been > > > > > > seeing > > > > > > > a > > > > > > > > > > > >> > significant increase in real-world demands from > > users > > > > who > > > > > > are > > > > > > > > > > already > > > > > > > > > > > >> > building on Flink and other traditional data > > processing > > > > or > > > > > > BI > > > > > > > > > > engines. > > > > > > > > > > > >> > From a platform perspective, more and more teams > are > > > > trying > > > > > > to > > > > > > > > > > > >> > integrate AI capabilities directly into their > > existing > > > > > > > streaming > > > > > > > > > > > >> > pipelines, rather than treating them as separate > > > > systems. > > > > > > > > > > > >> > > > > > > > > > > > > >> > This is not an isolated trend — it is becoming a > > common > > > > > > > > > requirement > > > > > > > > > > > >> > across industries. > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 1. What is changing in real systems > > > > > > > > > > > >> > > > > > > > > > > > > >> > We are observing a consistent shift: > > > > > > > > > > > >> > > > > > > > > > > > > >> > AI is moving from offline analysis or request-time > > > > scoring > > > > > > to > > > > > > > > > > > >> > continuous, event-driven decision making. > > > > > > > > > > > >> > > > > > > > > > > > > >> > In other words: > > > > > > > > > > > >> > > > > > > > > > > > > >> > AI is becoming part of the data stream itself. > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 2. Representative production scenarios > > > > > > > > > > > >> > > > > > > > > > > > > >> > 2.1 Real-time fraud detection (per-event decision > > under > > > > > > strict > > > > > > > > > > > latency) > > > > > > > > > > > >> > > > > > > > > > > > > >> > Typical setup > > > > > > > > > > > >> > > > > > > > > > > > > >> > Continuous transaction stream (payments, logins, > > > > transfers) > > > > > > > > > > > >> > Each event must be evaluated within milliseconds > > > > > > > > > > > >> > Decision depends on: > > > > > > > > > > > >> > > > > > > > > > > > > >> > recent user behavior > > > > > > > > > > > >> > device / IP patterns > > > > > > > > > > > >> > short-term aggregates > > > > > > > > > > > >> > > > > > > > > > > > > >> > What is happening in practice > > > > > > > > > > > >> > > > > > > > > > > > > >> > Models are already deeply integrated into decision > > flow > > > > > > > > > > > >> > Feature freshness directly impacts detection > > accuracy > > > > > > > > > > > >> > > > > > > > > > > > > >> > Current pain > > > > > > > > > > > >> > > > > > > > > > > > > >> > Features computed in streaming, but inference is > > remote > > > > > > > > > > > >> > Network overhead adds to critical path latency > > > > > > > > > > > >> > Hard to ensure training/serving consistency > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 2.2 Real-time recommendation and ads (continuous > > > > re-ranking) > > > > > > > > > > > >> > > > > > > > > > > > > >> > Typical setup > > > > > > > > > > > >> > > > > > > > > > > > > >> > User interaction stream (click, view, dwell time) > > > > > > > > > > > >> > Continuous feature updates (session + short-term > > > > behavior) > > > > > > > > > > > >> > Inference triggered per interaction > > > > > > > > > > > >> > > > > > > > > > > > > >> > What is happening in practice > > > > > > > > > > > >> > > > > > > > > > > > > >> > Increasing reliance on real-time context > > > > > > > > > > > >> > Model-based ranking becomes core logic > > > > > > > > > > > >> > > > > > > > > > > > > >> > Current pain > > > > > > > > > > > >> > > > > > > > > > > > > >> > Offline and online feature pipelines diverge > > > > > > > > > > > >> > Training-serving skew is common > > > > > > > > > > > >> > Inference orchestration is ad-hoc > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 2.3 Streaming RAG / knowledge systems (continuous > > > > indexing) > > > > > > > > > > > >> > > > > > > > > > > > > >> > Typical setup > > > > > > > > > > > >> > > > > > > > > > > > > >> > Continuous ingestion of documents, logs, or > > knowledge > > > > > > > > > > > >> > Pipeline: > > > > > > > > > > > >> > > > > > > > > > > > > >> > chunking → embedding → indexing → retrieval > > > > > > > > > > > >> > > > > > > > > > > > > >> > Typical use cases > > > > > > > > > > > >> > > > > > > > > > > > > >> > AI copilots > > > > > > > > > > > >> > enterprise knowledge assistants > > > > > > > > > > > >> > observability systems > > > > > > > > > > > >> > > > > > > > > > > > > >> > Current pain > > > > > > > > > > > >> > > > > > > > > > > > > >> > Built via loosely coupled services or scripts > > > > > > > > > > > >> > No strong consistency guarantees > > > > > > > > > > > >> > Difficult to scale and recover > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 2.4 Real-time feedback loop (continuous > evaluation) > > > > > > > > > > > >> > > > > > > > > > > > > >> > Typical setup > > > > > > > > > > > >> > > > > > > > > > > > > >> > Prediction at time T > > > > > > > > > > > >> > Label arrives at T + Δ > > > > > > > > > > > >> > > > > > > > > > > > > >> > Required processing > > > > > > > > > > > >> > > > > > > > > > > > > >> > prediction stream JOIN label stream → metrics → > > > > optimization > > > > > > > > > > > >> > > > > > > > > > > > > >> > Current pain > > > > > > > > > > > >> > > > > > > > > > > > > >> > Alignment logic duplicated across systems > > > > > > > > > > > >> > Late data handling is complex > > > > > > > > > > > >> > No reusable evaluation abstraction > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 2.5 AI replacing rule-based decision logic > > > > > > > > > > > >> > > > > > > > > > > > > >> > Evolution > > > > > > > > > > > >> > > > > > > > > > > > > >> > From: > > > > > > > > > > > >> > > > > > > > > > > > > >> > rule engine / CEP > > > > > > > > > > > >> > > > > > > > > > > > > >> > To: > > > > > > > > > > > >> > > > > > > > > > > > > >> > model / LLM → decision > > > > > > > > > > > >> > > > > > > > > > > > > >> > Implication > > > > > > > > > > > >> > > > > > > > > > > > > >> > AI is becoming the core decision layer inside > > streaming > > > > > > > systems. > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 3. Architectural shift > > > > > > > > > > > >> > > > > > > > > > > > > >> > Across all scenarios: > > > > > > > > > > > >> > > > > > > > > > > > > >> > From: > > > > > > > > > > > >> > > > > > > > > > > > > >> > data processing → feature system → model serving → > > > > > > evaluation > > > > > > > > > > > >> > > > > > > > > > > > > >> > To: > > > > > > > > > > > >> > > > > > > > > > > > > >> > stream = feature + inference + decision + feedback > > > > > > > > > > > >> > > > > > > > > > > > > >> > This reflects a fundamental change in system > > boundaries. > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 4. Why externalized architectures break down > > > > > > > > > > > >> > > > > > > > > > > > > >> > Most current implementations rely on multiple > > systems: > > > > > > > > > > > >> > > > > > > > > > > > > >> > stream processing > > > > > > > > > > > >> > feature store > > > > > > > > > > > >> > model serving > > > > > > > > > > > >> > vector database > > > > > > > > > > > >> > > > > > > > > > > > > >> > This introduces several fundamental issues in > > real-time > > > > > > > > scenarios. > > > > > > > > > > > >> > > > > > > > > > > > > >> > 4.1 Latency dominated by system boundaries > > > > > > > > > > > >> > > > > > > > > > > > > >> > stream → network → model service → response > > > > > > > > > > > >> > > > > > > > > > > > > >> > network overhead is unavoidable > > > > > > > > > > > >> > batching is not controlled by the stream runtime > > > > > > > > > > > >> > no end-to-end backpressure > > > > > > > > > > > >> > > > > > > > > > > > > >> > Latency becomes a system-level artifact rather > than > > a > > > > > > compute > > > > > > > > > > > property. > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 4.2 Inconsistent data between training and serving > > > > > > > > > > > >> > > > > > > > > > > > > >> > offline vs online features > > > > > > > > > > > >> > different definitions or time windows > > > > > > > > > > > >> > > > > > > > > > > > > >> > Models operate on inconsistent data distributions. > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 4.3 State fragmentation > > > > > > > > > > > >> > > > > > > > > > > > > >> > user/session context must be rebuilt or fetched > > > > > > > > > > > >> > loss of data locality > > > > > > > > > > > >> > processing becomes call-driven > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 4.4 Feedback loop is not composable > > > > > > > > > > > >> > > > > > > > > > > > > >> > difficult alignment of prediction and label > streams > > > > > > > > > > > >> > no unified handling of late data > > > > > > > > > > > >> > duplicated evaluation logic > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 4.5 Operational complexity > > > > > > > > > > > >> > > > > > > > > > > > > >> > multiple systems to scale > > > > > > > > > > > >> > multiple failure domains > > > > > > > > > > > >> > complex debugging paths > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 5. Why this aligns with Flink > > > > > > > > > > > >> > > > > > > > > > > > > >> > These workloads require: > > > > > > > > > > > >> > > > > > > > > > > > > >> > event-driven execution > > > > > > > > > > > >> > strong state management > > > > > > > > > > > >> > precise time semantics > > > > > > > > > > > >> > continuous feedback > > > > > > > > > > > >> > > > > > > > > > > > > >> > These are exactly Flink’s core strengths. > > > > > > > > > > > >> > > > > > > > > > > > > >> > The key insight is: > > > > > > > > > > > >> > > > > > > > > > > > > >> > Inference should be modeled as a dataflow > operator, > > not > > > > an > > > > > > > > > external > > > > > > > > > > > >> > service. > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 6. Implication > > > > > > > > > > > >> > > > > > > > > > > > > >> > If we model: > > > > > > > > > > > >> > > > > > > > > > > > > >> > data → feature → inference → decision → feedback > > > > > > > > > > > >> > > > > > > > > > > > > >> > within Flink, we can achieve: > > > > > > > > > > > >> > > > > > > > > > > > > >> > unified scheduling > > > > > > > > > > > >> > shared state > > > > > > > > > > > >> > consistent time semantics > > > > > > > > > > > >> > end-to-end fault tolerance > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > 7. Conclusion > > > > > > > > > > > >> > > > > > > > > > > > > >> > This is not simply about adding AI support to > Flink. > > > > > > > > > > > >> > > > > > > > > > > > > >> > It is about recognizing that: > > > > > > > > > > > >> > > > > > > > > > > > > >> > Real-time AI systems are fundamentally streaming > > > > systems. > > > > > > > > > > > >> > > > > > > > > > > > > >> > The question is whether Flink evolves to support > > this > > > > > > > natively, > > > > > > > > or > > > > > > > > > > > >> > remains a preprocessing layer in front of external > > AI > > > > > > stacks. > > > > > > > > > > > >> > > > > > > > > > > > > >> > ________________________________ > > > > > > > > > > > >> > > > > > > > > > > > > >> > Happy to follow up with a more concrete proposal > > (e.g., > > > > > > > > inference > > > > > > > > > > > >> > operator abstraction) if there is interest. > > > > > > > > > > > >> > > > > > > > > > > > > >> > Thanks. > > > > > > > > > > > >> > > > > > > > > > > > > >> > On Mon, May 4, 2026 at 10:31 PM Gen Luo < > > > > > > [email protected]> > > > > > > > > > > wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > Hi all, > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > Thank you Guowei Ma for driving this discussion, > > and > > > > > > thanks > > > > > > > > > > everyone > > > > > > > > > > > >> for > > > > > > > > > > > >> > > the valuable insights. Inspired by this > exchange, > > I’d > > > > like > > > > > > > to > > > > > > > > > > share > > > > > > > > > > > a > > > > > > > > > > > >> few > > > > > > > > > > > >> > > thoughts. > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > While “AI-Native” covers broad ground, I believe > > this > > > > FLIP > > > > > > > > does > > > > > > > > > > not > > > > > > > > > > > >> > > overextend Flink’s scope. It’s a necessary > > iteration > > > > > > driven > > > > > > > by > > > > > > > > > > > evolving > > > > > > > > > > > >> > > user scenarios and AI advancements, particularly > > > > > > multimodal > > > > > > > > > > > processing. > > > > > > > > > > > >> > > Given the growing adoption of multimodal > > applications > > > > and > > > > > > > > > > increasing > > > > > > > > > > > >> > > interest in low-latency inference, initiating > > these > > > > > > > > enhancements > > > > > > > > > > is > > > > > > > > > > > a > > > > > > > > > > > >> > > timely step to better align Flink with evolving > AI > > > > > > > workloads. > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > From our engagements with customers and > > developers, we > > > > > > > > observe a > > > > > > > > > > > clear > > > > > > > > > > > >> > > shift in both workloads and user expectations. > > Model > > > > > > > inference > > > > > > > > > is > > > > > > > > > > > >> > > increasingly central to data pipelines, with > > > > multimodal AI > > > > > > > > tasks > > > > > > > > > > > >> growing > > > > > > > > > > > >> > > rapidly. Traditional real-time scenarios (e.g., > > > > monitoring > > > > > > > and > > > > > > > > > > > >> analytics) > > > > > > > > > > > >> > > now leverage models and agent frameworks like > > Flink > > > > Agent > > > > > > > for > > > > > > > > > > > >> > intelligent, > > > > > > > > > > > >> > > multi-turn decision-making, while large-scale > > offline > > > > > > > compute > > > > > > > > is > > > > > > > > > > > also > > > > > > > > > > > >> > > shifting toward LLMs and vision models. > Alongside > > this > > > > > > > > workload > > > > > > > > > > > >> > evolution, > > > > > > > > > > > >> > > developer workflows have adapted: AI > practitioners > > > > > > naturally > > > > > > > > > > prefer > > > > > > > > > > > >> > Python > > > > > > > > > > > >> > > and DataFrame-style APIs. As AI-assisted coding > > > > matures, > > > > > > > > > aligning > > > > > > > > > > > >> system > > > > > > > > > > > >> > > interfaces with these familiar patterns will > > directly > > > > > > > improve > > > > > > > > > > > >> > AI-generated > > > > > > > > > > > >> > > code quality and significantly lower adoption > > > > barriers for > > > > > > > the > > > > > > > > > AI > > > > > > > > > > > >> > community. > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > Today, many AI evaluation tools don’t yet > > recommend > > > > Flink > > > > > > > for > > > > > > > > AI > > > > > > > > > > > >> > > workloads—largely due to limited visibility of > > Flink’s > > > > > > > > relevant > > > > > > > > > > > >> > > capabilities rather than fundamental > > incompatibility. > > > > In > > > > > > > > > reality, > > > > > > > > > > > Flink > > > > > > > > > > > >> > has > > > > > > > > > > > >> > > unique strengths here. For example, generating > > > > multimodal > > > > > > > > > samples > > > > > > > > > > is > > > > > > > > > > > >> > often > > > > > > > > > > > >> > > a multi-day, GPU-heavy process. Flink’s > streaming > > > > model, > > > > > > > > > combined > > > > > > > > > > > with > > > > > > > > > > > >> > > checkpointing and reduced disk I/O, is > > well-suited for > > > > > > such > > > > > > > > > > > >> long-running > > > > > > > > > > > >> > > tasks—a direction also pursued by engines like > > Daft > > > > and > > > > > > Ray > > > > > > > > > Data. > > > > > > > > > > > With > > > > > > > > > > > >> > > Flink’s proven production stability, we’re > > > > well-positioned > > > > > > > for > > > > > > > > > > both > > > > > > > > > > > >> batch > > > > > > > > > > > >> > > and future real-time multimodal streaming > > inference. > > > > > > > Targeted > > > > > > > > > > > >> > improvements > > > > > > > > > > > >> > > can make these advantages visible, driving > better > > user > > > > > > > > > experiences > > > > > > > > > > > and > > > > > > > > > > > >> > > healthier ecosystem growth. > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > I’d also note a lesson from FlinkML. It > attempted > > to > > > > cover > > > > > > > > model > > > > > > > > > > > >> training > > > > > > > > > > > >> > > but struggled to align with the fast-iteration, > > > > > > > > > > > Python/notebook-centric > > > > > > > > > > > >> > > workflows preferred by ML researchers. Flink’s > > core > > > > > > strength > > > > > > > > > lies > > > > > > > > > > in > > > > > > > > > > > >> > > high-concurrency, production-grade inference > > > > > > > orchestration—not > > > > > > > > > > > training > > > > > > > > > > > >> > > lifecycle management (e.g., experiment tracking, > > > > > > > versioning). > > > > > > > > > This > > > > > > > > > > > >> > mismatch > > > > > > > > > > > >> > > limited its adoption. > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > This proposal, however, takes a different path. > It > > > > doesn’t > > > > > > > aim > > > > > > > > > to > > > > > > > > > > > >> replace > > > > > > > > > > > >> > > training frameworks. Instead, it introduces > > modern AI > > > > > > > concepts > > > > > > > > > > > >> > (multimodal > > > > > > > > > > > >> > > data, LLMs) as first-class citizens for > inference, > > > > built > > > > > > > atop > > > > > > > > > > > Flink’s > > > > > > > > > > > >> > > computation strengths. Think Ray Data’s scope > > (plus > > > > simple > > > > > > > > > > > co-located > > > > > > > > > > > >> > > serving), not Train/Tune. Crucially, unlike the > > > > FlinkML > > > > > > era, > > > > > > > > > > today’s > > > > > > > > > > > >> > models > > > > > > > > > > > >> > > use standardized interfaces and mature serving > > > > frameworks, > > > > > > > > > > allowing > > > > > > > > > > > >> Flink > > > > > > > > > > > >> > > to integrate external models seamlessly without > > heavy > > > > > > > > > > > >> > > customization—significantly lowering project > risk. > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > This FLIP marks Flink’s another starting point > for > > > > the AI > > > > > > > era. > > > > > > > > > > While > > > > > > > > > > > >> > > details need refinement, I believe this > direction > > > > aligns > > > > > > > with > > > > > > > > > both > > > > > > > > > > > >> > current > > > > > > > > > > > >> > > and future user needs and Flink’s evolution. > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > Best, > > > > > > > > > > > >> > > Gen > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > On Mon, May 4, 2026 at 12:15 PM Jark Wu < > > > > [email protected] > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > Hi Guowei, > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > Thanks for driving this. +1 on the overall > > > > direction. > > > > > > > > Flink's > > > > > > > > > > > >> > > > streaming processing and checkpoint mechanism > > give > > > > it a > > > > > > > > > > structural > > > > > > > > > > > >> > > > advantage over systems like Daft and Ray. But > > today, > > > > > > these > > > > > > > > > > runtime > > > > > > > > > > > >> > > > strengths are held back by gaps in Python API, > > GPU > > > > > > > > scheduling, > > > > > > > > > > and > > > > > > > > > > > >> > > > native multimodal data handling. This umbrella > > FLIP > > > > > > > > addresses > > > > > > > > > > > exactly > > > > > > > > > > > >> > > > that gap, comprehensively and systematically. > I > > > > believe > > > > > > > > > > multimodal > > > > > > > > > > > >> > > > data processing is the biggest opportunity for > > > > > > traditional > > > > > > > > > data > > > > > > > > > > > infra > > > > > > > > > > > >> > > > to transition into AI infra, and this is one > of > > the > > > > most > > > > > > > > > > important > > > > > > > > > > > >> > > > FLIPs for Flink in the AI era. > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > As one of the Table/SQL module maintainers, we > > would > > > > > > like > > > > > > > to > > > > > > > > > > > >> > > > contribute the built-in multimodal processing > > UDFs > > > > > > (audio, > > > > > > > > > > video, > > > > > > > > > > > >> > > > image, text) and native multimodal data types > > > > (Tensor, > > > > > > > > Image, > > > > > > > > > > > >> > > > Embedding, etc.) as first-class citizens in > the > > type > > > > > > > system. > > > > > > > > > > > Looking > > > > > > > > > > > >> > > > forward to the sub-FLIP discussions. > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > Best, > > > > > > > > > > > >> > > > Jark > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > On Thu, 30 Apr 2026 at 18:42, Guowei Ma < > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > >> wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > Hi,Yaroslav > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > Thanks for taking the time to write this > > detailed > > > > > > > > feedback. > > > > > > > > > > Let > > > > > > > > > > > me > > > > > > > > > > > >> > > > clarify > > > > > > > > > > > >> > > > > the intent of the proposal first. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > I am not saying that Flink should become an > AI > > > > > > > framework, > > > > > > > > an > > > > > > > > > > ML > > > > > > > > > > > >> > platform, > > > > > > > > > > > >> > > > > or a model serving system. The way I use > > > > "AI-Native" > > > > > > in > > > > > > > > this > > > > > > > > > > > >> > proposal is > > > > > > > > > > > >> > > > to > > > > > > > > > > > >> > > > > say that Flink should support, as > first-class > > > > > > citizens, > > > > > > > > the > > > > > > > > > > core > > > > > > > > > > > >> > objects > > > > > > > > > > > >> > > > > and execution patterns that frequently show > > up in > > > > > > > > > AI-oriented > > > > > > > > > > > data > > > > > > > > > > > >> > > > > processing — instead of leaving them > entirely > > to > > > > > > > external > > > > > > > > > > > systems > > > > > > > > > > > >> or > > > > > > > > > > > >> > ad > > > > > > > > > > > >> > > > hoc > > > > > > > > > > > >> > > > > user-defined integrations. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > These objects and execution patterns > include: > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > - Multimodal and unstructured data > objects > > > > such as > > > > > > > > > images, > > > > > > > > > > > >> video, > > > > > > > > > > > >> > > > audio, > > > > > > > > > > > >> > > > > tensors, embeddings, and object > references. > > > > > > > > > > > >> > > > > - Model inference as part of the data > flow, > > > > rather > > > > > > > than > > > > > > > > > an > > > > > > > > > > > >> > entirely > > > > > > > > > > > >> > > > > external black-box service call. > > > > > > > > > > > >> > > > > - Operators backed by heterogeneous > > resources > > > > such > > > > > > as > > > > > > > > > GPUs. > > > > > > > > > > > >> > > > > - Pythonic and vectorized processing > > styles. > > > > > > > > > > > >> > > > > - Long-running, long-tailed asynchronous > > > > > > computation. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > "AI-Native" is just a shorthand here, > meaning > > that > > > > > > Flink > > > > > > > > > > should > > > > > > > > > > > >> > natively > > > > > > > > > > > >> > > > > understand and support the core abstractions > > of > > > > this > > > > > > > class > > > > > > > > > of > > > > > > > > > > > >> > workloads. > > > > > > > > > > > >> > > > > The FLIP needs to make the target workload > > class > > > > > > > clearer. > > > > > > > > > What > > > > > > > > > > > we > > > > > > > > > > > >> > care > > > > > > > > > > > >> > > > > about is not any specific model paradigm — > > LLM, > > > > CV, > > > > > > > > > > > recommendation, > > > > > > > > > > > >> > or > > > > > > > > > > > >> > > > > traditional ML inference — but a class of > data > > > > > > > processing > > > > > > > > > > > workloads > > > > > > > > > > > >> > with > > > > > > > > > > > >> > > > > shared runtime and topology characteristics: > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > - A single computation may take seconds > or > > even > > > > > > > > minutes, > > > > > > > > > > > instead > > > > > > > > > > > >> > of > > > > > > > > > > > >> > > > > microseconds as in traditional > > row-at-a-time > > > > > > > > processing. > > > > > > > > > > > >> > > > > - Execution often involves heterogeneous > > > > resources > > > > > > > such > > > > > > > > > as > > > > > > > > > > > CPU + > > > > > > > > > > > >> > GPU, > > > > > > > > > > > >> > > > > where GPUs are expensive and scarce. > > > > > > > > > > > >> > > > > - Data is often multimodal large objects > > > > (images, > > > > > > > > video, > > > > > > > > > > > audio, > > > > > > > > > > > >> > > > tensors, > > > > > > > > > > > >> > > > > embeddings), rather than structured small > > > > records. > > > > > > > > > > > >> > > > > - Computation logic often includes model > > > > inference > > > > > > or > > > > > > > > > > > >> > service-style > > > > > > > > > > > >> > > > > invocations as part of the pipeline. > > > > > > > > > > > >> > > > > - Many target topologies are relatively > > > > > > shuffle-light > > > > > > > > and > > > > > > > > > > > don't > > > > > > > > > > > >> > > > > necessarily involve complex keyed-state > > > > migration, > > > > > > > e.g. > > > > > > > > > > URI → > > > > > > > > > > > >> > > > preprocessing > > > > > > > > > > > >> > > > > → inference → sink. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > Ten years ago, many ML workloads took the > > form of > > > > > > > offline > > > > > > > > > > > training > > > > > > > > > > > >> > plus > > > > > > > > > > > >> > > > > online feature serving. Flink already > played a > > > > strong > > > > > > > role > > > > > > > > > in > > > > > > > > > > > >> feature > > > > > > > > > > > >> > > > > engineering, streaming feature computation, > > and > > > > > > > real-time > > > > > > > > > data > > > > > > > > > > > >> > > > preparation, > > > > > > > > > > > >> > > > > so there was no strong need to reshape Flink > > into > > > > an > > > > > > > > > > "ML-Native" > > > > > > > > > > > >> > engine. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > What is changing today is that model > inference > > > > itself > > > > > > is > > > > > > > > > > > >> increasingly > > > > > > > > > > > >> > > > > becoming part of the data processing > pipeline; > > > > > > > multimodal > > > > > > > > > > > objects > > > > > > > > > > > >> > are no > > > > > > > > > > > >> > > > > longer just opaque blobs in external > storage, > > but > > > > data > > > > > > > > > objects > > > > > > > > > > > that > > > > > > > > > > > >> > need > > > > > > > > > > > >> > > > to > > > > > > > > > > > >> > > > > be referenced, passed, transformed, inferred > > > > over, and > > > > > > > > > landed > > > > > > > > > > > >> inside > > > > > > > > > > > >> > the > > > > > > > > > > > >> > > > > engine. This is not simply one more ML use > > case — > > > > it > > > > > > is > > > > > > > a > > > > > > > > > > > change in > > > > > > > > > > > >> > the > > > > > > > > > > > >> > > > > shape of workloads Flink needs to support. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > On whether the user demand is real, the > > validation > > > > > > > signals > > > > > > > > > we > > > > > > > > > > > are > > > > > > > > > > > >> > > > currently > > > > > > > > > > > >> > > > > seeing include: > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > - Within Alibaba, multimodal data > > processing is > > > > > > > already > > > > > > > > > in > > > > > > > > > > > >> > production, > > > > > > > > > > > >> > > > > covering image, video, audio, and text > > > > modalities. > > > > > > > > > > > >> > > > > - In offline conversations with several > > > > companies > > > > > > > > > > (including > > > > > > > > > > > >> > ByteDance > > > > > > > > > > > >> > > > > and Tencent), we have heard substantial > > demand > > > > for > > > > > > > > Flink > > > > > > > > > to > > > > > > > > > > > >> > support > > > > > > > > > > > >> > > > AI data > > > > > > > > > > > >> > > > > processing / multimodal data processing. > > > > > > > > > > > >> > > > > - On the ecosystem side, we are working > > with > > > > NVIDIA > > > > > > > on > > > > > > > > a > > > > > > > > > > > joint > > > > > > > > > > > >> > demo > > > > > > > > > > > >> > > > > focused on multimodal data processing, > > planned > > > > for > > > > > > > > Flink > > > > > > > > > > > Forward > > > > > > > > > > > >> > Asia. > > > > > > > > > > > >> > > > > - The emergence and growth of systems > such > > as > > > > Daft, > > > > > > > Ray > > > > > > > > > > Data, > > > > > > > > > > > >> > > > > Data-Juicer, and LAS also reflect rapidly > > > > growing > > > > > > > > demand > > > > > > > > > > for > > > > > > > > > > > >> > > > multimodal > > > > > > > > > > > >> > > > > data processing. > > > > > > > > > > > >> > > > > - There have also been independent > > discussions > > > > in > > > > > > > this > > > > > > > > > > > direction > > > > > > > > > > > >> > > > within > > > > > > > > > > > >> > > > > the community — for example, the > > > > "Streaming-native > > > > > > AI > > > > > > > > > > > Inference > > > > > > > > > > > >> > > > Runtime > > > > > > > > > > > >> > > > > Layer" proposal on the dev list. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > On "why now, instead of waiting for > > > > standardization" > > > > > > — I > > > > > > > > > > > understand > > > > > > > > > > > >> > the > > > > > > > > > > > >> > > > > concern. LLM-related frameworks, APIs, and > > > > > > > > application-level > > > > > > > > > > > >> > patterns are > > > > > > > > > > > >> > > > > indeed changing quickly. If this FLIP were > > trying > > > > to > > > > > > > bake > > > > > > > > a > > > > > > > > > > > >> specific > > > > > > > > > > > >> > LLM > > > > > > > > > > > >> > > > > API, agent framework, or prompt protocol > into > > > > Flink, > > > > > > the > > > > > > > > > risk > > > > > > > > > > > would > > > > > > > > > > > >> > be > > > > > > > > > > > >> > > > high. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > But most of the capabilities in this > proposal > > are > > > > not > > > > > > > > > > > LLM-specific. > > > > > > > > > > > >> > They > > > > > > > > > > > >> > > > > are more fundamental data processing and > > runtime > > > > > > > > > capabilities: > > > > > > > > > > > >> > Pipeline > > > > > > > > > > > >> > > > > Region-level checkpointing, Object > Reference, > > GPU > > > > > > > resource > > > > > > > > > > > >> > declaration, > > > > > > > > > > > >> > > > > columnar data transfer, service-style > operator > > > > > > > invocation, > > > > > > > > > > > >> > long-running > > > > > > > > > > > >> > > > > async execution. These are useful for > today's > > LLM > > > > > > > > workloads, > > > > > > > > > > and > > > > > > > > > > > >> > equally > > > > > > > > > > > >> > > > > useful for future AI workloads in shapes we > > cannot > > > > > > fully > > > > > > > > > > predict > > > > > > > > > > > >> > yet. The > > > > > > > > > > > >> > > > > fast-changing parts should live in the > > ecosystem > > > > and > > > > > > SDK > > > > > > > > > > layer; > > > > > > > > > > > the > > > > > > > > > > > >> > FLIP > > > > > > > > > > > >> > > > > should focus on more stable engine-level > > > > capabilities. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > On tactical changes vs. umbrella, I partly > > agree > > > > with > > > > > > > you. > > > > > > > > > > Each > > > > > > > > > > > >> > sub-FLIP > > > > > > > > > > > >> > > > > should be discussed, reviewed, and accepted > or > > > > > > rejected > > > > > > > on > > > > > > > > > its > > > > > > > > > > > own > > > > > > > > > > > >> > > > merits. > > > > > > > > > > > >> > > > > The umbrella should not bypass the normal > FLIP > > > > > > process, > > > > > > > > and > > > > > > > > > > > >> > accepting the > > > > > > > > > > > >> > > > > umbrella does not mean accepting all > > sub-FLIPs. > > > > That > > > > > > > > said, I > > > > > > > > > > > still > > > > > > > > > > > >> > think > > > > > > > > > > > >> > > > > the umbrella is valuable. Its purpose is not > > to > > > > bind > > > > > > the > > > > > > > > 11 > > > > > > > > > > > changes > > > > > > > > > > > >> > into > > > > > > > > > > > >> > > > a > > > > > > > > > > > >> > > > > single inseparable package, but to help the > > > > community > > > > > > > > align > > > > > > > > > on > > > > > > > > > > > >> > > > principles, > > > > > > > > > > > >> > > > > clarify boundaries and dependencies, and > avoid > > > > > > > conflicting > > > > > > > > > or > > > > > > > > > > > >> > duplicated > > > > > > > > > > > >> > > > > abstractions across related capabilities. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > For example, if RpcOperator is not > considered > > > > together > > > > > > > > with > > > > > > > > > > > >> > > > non-disruptive > > > > > > > > > > > >> > > > > scaling, it is hard to give GPU operator > > > > elasticity > > > > > > > > coherent > > > > > > > > > > > >> > semantics. > > > > > > > > > > > >> > > > > Deploying inference services independently > is > > > > only the > > > > > > > > first > > > > > > > > > > > step; > > > > > > > > > > > >> > the > > > > > > > > > > > >> > > > > harder question is how Flink uniformly > handles > > > > service > > > > > > > > > > > discovery, > > > > > > > > > > > >> > > > in-flight > > > > > > > > > > > >> > > > > request draining, backpressure, and failover > > > > during > > > > > > > > scaling. > > > > > > > > > > > >> Without > > > > > > > > > > > >> > an > > > > > > > > > > > >> > > > > umbrella, these capabilities can certainly > be > > > > advanced > > > > > > > as > > > > > > > > > > > tactical > > > > > > > > > > > >> > > > changes, > > > > > > > > > > > >> > > > > but we may end up with a set of abstractions > > that > > > > are > > > > > > > > > locally > > > > > > > > > > > >> usable > > > > > > > > > > > >> > but > > > > > > > > > > > >> > > > > globally inconsistent. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > On RpcOperator, I agree that we need to be > > very > > > > > > careful > > > > > > > in > > > > > > > > > > > defining > > > > > > > > > > > >> > the > > > > > > > > > > > >> > > > > boundary between the Flink runtime and > > external > > > > > > > > > orchestration > > > > > > > > > > > >> > systems. > > > > > > > > > > > >> > > > > Kubernetes or the Kubernetes Operator may > > well be > > > > the > > > > > > > > right > > > > > > > > > > > choice > > > > > > > > > > > >> > at the > > > > > > > > > > > >> > > > > physical deployment level. But I still > believe > > > > Flink > > > > > > > > needs a > > > > > > > > > > > >> > first-class > > > > > > > > > > > >> > > > > RpcOperator abstraction, because deployment > is > > > > only > > > > > > part > > > > > > > > of > > > > > > > > > > the > > > > > > > > > > > >> > problem — > > > > > > > > > > > >> > > > > the harder part is its semantic integration > > with > > > > the > > > > > > > Flink > > > > > > > > > > job. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > If model inference is part of the logical > data > > > > flow, > > > > > > > Flink > > > > > > > > > > > needs at > > > > > > > > > > > >> > > > minimum > > > > > > > > > > > >> > > > > to be aware of its service discovery, > > backpressure > > > > > > > > behavior, > > > > > > > > > > > >> failover > > > > > > > > > > > >> > > > > behavior, in-flight request draining, and > > scaling > > > > > > > > > > coordination. > > > > > > > > > > > If > > > > > > > > > > > >> > it is > > > > > > > > > > > >> > > > > hidden entirely behind an external black-box > > > > service, > > > > > > it > > > > > > > > is > > > > > > > > > > hard > > > > > > > > > > > >> for > > > > > > > > > > > >> > > > Flink > > > > > > > > > > > >> > > > > to provide consistent job-level semantics > and > > > > > > > operational > > > > > > > > > > > >> experience. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > So the point of RpcOperator is not > necessarily > > > > that > > > > > > > "every > > > > > > > > > > > physical > > > > > > > > > > > >> > > > process > > > > > > > > > > > >> > > > > must be directly launched and managed by > Flink > > > > core," > > > > > > > but > > > > > > > > > that > > > > > > > > > > > >> Flink > > > > > > > > > > > >> > > > needs > > > > > > > > > > > >> > > > > to define a service-style operator contract > > that > > > > > > allows > > > > > > > > such > > > > > > > > > > > >> > operators to > > > > > > > > > > > >> > > > > be invoked correctly by the data flow, > > coordinated > > > > > > > > correctly > > > > > > > > > > by > > > > > > > > > > > the > > > > > > > > > > > >> > > > > runtime, and understood and operated by > users > > as > > > > part > > > > > > > of a > > > > > > > > > > Flink > > > > > > > > > > > >> job. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > On vectorized batch processing, I agree the > > > > long-term > > > > > > > > > > direction > > > > > > > > > > > >> > should > > > > > > > > > > > >> > > > not > > > > > > > > > > > >> > > > > stop at Python. Native columnar / vectorized > > > > execution > > > > > > > is > > > > > > > > an > > > > > > > > > > > >> > end-to-end > > > > > > > > > > > >> > > > > problem that touches connectors, formats, > the > > type > > > > > > > system, > > > > > > > > > > > runtime, > > > > > > > > > > > >> > Java, > > > > > > > > > > > >> > > > > SQL, and Python. The current proposal starts > > from > > > > the > > > > > > > > > > > Java/Python > > > > > > > > > > > >> > > > boundary > > > > > > > > > > > >> > > > > because that is where the row/column > > conversion > > > > > > overhead > > > > > > > > is > > > > > > > > > > most > > > > > > > > > > > >> > visible. > > > > > > > > > > > >> > > > > End-to-end columnar execution on the Java > and > > SQL > > > > side > > > > > > > > > > deserves > > > > > > > > > > > to > > > > > > > > > > > >> be > > > > > > > > > > > >> > > > > discussed further as a separate, larger > FLIP. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > On multimodal types and SerDes complexity, I > > agree > > > > > > this > > > > > > > > > needs > > > > > > > > > > > to be > > > > > > > > > > > >> > > > handled > > > > > > > > > > > >> > > > > carefully. Making AI-related objects > > first-class > > > > does > > > > > > > not > > > > > > > > > > imply > > > > > > > > > > > >> that > > > > > > > > > > > >> > > > every > > > > > > > > > > > >> > > > > connector must immediately and fully support > > > > image, > > > > > > > video, > > > > > > > > > > > audio, > > > > > > > > > > > >> > tensor, > > > > > > > > > > > >> > > > > and so on. The concrete incremental path, > > fallback > > > > > > > > strategy, > > > > > > > > > > and > > > > > > > > > > > >> the > > > > > > > > > > > >> > > > > boundary between formats, connector API, and > > the > > > > type > > > > > > > > system > > > > > > > > > > > will > > > > > > > > > > > >> be > > > > > > > > > > > >> > > > > discussed further in the corresponding > > sub-FLIPs. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > Coming back to the core of the proposal: it > > is not > > > > > > about > > > > > > > > > > turning > > > > > > > > > > > >> > Flink > > > > > > > > > > > >> > > > into > > > > > > > > > > > >> > > > > an AI framework. It is about making the core > > > > objects > > > > > > and > > > > > > > > > > > execution > > > > > > > > > > > >> > > > patterns > > > > > > > > > > > >> > > > > of AI-oriented data processing first-class > > > > citizens in > > > > > > > > > Flink. > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > Best, > > > > > > > > > > > >> > > > > Guowei > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > On Thu, Apr 30, 2026 at 5:37 AM Yaroslav > > > > Tkachenko < > > > > > > > > > > > >> > [email protected] > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > Hi Guowei, > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > Thank you for writing this proposal. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > I may be in the minority here, but I hope > my > > > > voice > > > > > > > will > > > > > > > > be > > > > > > > > > > > >> heard. I > > > > > > > > > > > >> > > > > > disagree with turning Flink into an > > "AI-Native" > > > > > > > engine. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > Regarding your "Data processing is > entering > > the > > > > AI > > > > > > > era, > > > > > > > > > and > > > > > > > > > > > Flink > > > > > > > > > > > >> > > > needs to > > > > > > > > > > > >> > > > > > evolve from a traditional BI compute > engine > > > > into a > > > > > > > data > > > > > > > > > > engine > > > > > > > > > > > >> that > > > > > > > > > > > >> > > > > > natively supports AI workloads" claim: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > - How exactly do you define "AI"? I don't > > > > believe > > > > > > > there > > > > > > > > > is a > > > > > > > > > > > >> > standard > > > > > > > > > > > >> > > > > > definition. For example, Machine Learning > > have > > > > been > > > > > > > > around > > > > > > > > > > for > > > > > > > > > > > >> more > > > > > > > > > > > >> > > > than a > > > > > > > > > > > >> > > > > > decade, but there were no proposals (or > > need, > > > > in my > > > > > > > > > opinion) > > > > > > > > > > > to > > > > > > > > > > > >> > turn > > > > > > > > > > > >> > > > Flink > > > > > > > > > > > >> > > > > > into an "ML-Native" engine. Flink, in its > > > > current > > > > > > > state, > > > > > > > > > has > > > > > > > > > > > >> > > > > > been successfully used in many systems > > alongside > > > > > > > > dedicated > > > > > > > > > > ML > > > > > > > > > > > >> > > > technologies, > > > > > > > > > > > >> > > > > > like feature stores. Based on the context > of > > > > your > > > > > > > > > proposal, > > > > > > > > > > it > > > > > > > > > > > >> > looks > > > > > > > > > > > >> > > > like > > > > > > > > > > > >> > > > > > you mostly mean LLMs, so could you be > > specific > > > > about > > > > > > > the > > > > > > > > > > > >> language? > > > > > > > > > > > >> > > > > > - I wouldn't call Flink "a traditional BI > > > > compute > > > > > > > > engine". > > > > > > > > > > > Flink > > > > > > > > > > > >> > is a > > > > > > > > > > > >> > > > > > general data processing technology which > > can be > > > > used > > > > > > > > for a > > > > > > > > > > > >> variety > > > > > > > > > > > >> > of > > > > > > > > > > > >> > > > use > > > > > > > > > > > >> > > > > > cases without any BI involvement. > > > > > > > > > > > >> > > > > > - Do you have any proof that "Users' core > > > > workloads > > > > > > > are > > > > > > > > > > > rapidly > > > > > > > > > > > >> > > > evolving" > > > > > > > > > > > >> > > > > > and that they require your proposed > changes? > > > > Case > > > > > > > > studies, > > > > > > > > > > > user > > > > > > > > > > > >> > > > surveys, or > > > > > > > > > > > >> > > > > > submitted issues about the lack of > support? > > Big > > > > > > > changes > > > > > > > > > like > > > > > > > > > > > that > > > > > > > > > > > >> > > > require > > > > > > > > > > > >> > > > > > extensive validation. > > > > > > > > > > > >> > > > > > - And even if there is a real need to > adopt > > some > > > > > > > > > LLM-driven > > > > > > > > > > > >> > changes, > > > > > > > > > > > >> > > > why > > > > > > > > > > > >> > > > > > now? The LLM-related tooling has been > > changing > > > > so > > > > > > > > rapidly, > > > > > > > > > > and > > > > > > > > > > > >> it's > > > > > > > > > > > >> > > > hard to > > > > > > > > > > > >> > > > > > predict what will be needed tomorrow. Why > > does > > > > it > > > > > > make > > > > > > > > > sense > > > > > > > > > > > to > > > > > > > > > > > >> > > > introduce > > > > > > > > > > > >> > > > > > changes now, and not wait for more > > > > standardization > > > > > > and > > > > > > > > > > > >> > consolidation? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > To summarize, I think there are a lot of > > great > > > > ideas > > > > > > > in > > > > > > > > > the > > > > > > > > > > > >> > proposal, > > > > > > > > > > > >> > > > but > > > > > > > > > > > >> > > > > > in my mind, they need to be addressed as > > > > tactical, > > > > > > > > focused > > > > > > > > > > > >> > changes, not > > > > > > > > > > > >> > > > > > under the "AI-Native" umbrella. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > I also wanted to address a few more > specific > > > > points: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > - RpcOperator, why does it need to be > > managed by > > > > > > > Flink? > > > > > > > > I > > > > > > > > > > see > > > > > > > > > > > >> > > > absolutely no > > > > > > > > > > > >> > > > > > need to introduce the additional > complexity > > of > > > > > > > > > orchestrating > > > > > > > > > > > >> > standalone > > > > > > > > > > > >> > > > > > components into the core Flink engine. I > can > > > > > > imagine a > > > > > > > > > > > separate > > > > > > > > > > > >> > > > sub-project > > > > > > > > > > > >> > > > > > for an RpcOperator, which could > potentially > > be > > > > > > managed > > > > > > > > by > > > > > > > > > > the > > > > > > > > > > > >> > > > Kubernetes > > > > > > > > > > > >> > > > > > Operator. > > > > > > > > > > > >> > > > > > - You make the case for the vectorized > batch > > > > > > > processing, > > > > > > > > > but > > > > > > > > > > > only > > > > > > > > > > > >> > on > > > > > > > > > > > >> > > > the > > > > > > > > > > > >> > > > > > Python side. Why stop there? Native > columnar > > > > > > > vectorized > > > > > > > > > > > execution > > > > > > > > > > > >> > will > > > > > > > > > > > >> > > > > > require end-to-end changes, including > > > > connectors, > > > > > > data > > > > > > > > > > format > > > > > > > > > > > >> > support, > > > > > > > > > > > >> > > > Type > > > > > > > > > > > >> > > > > > system support, runtime changes, etc. It > > seems > > > > > > logical > > > > > > > > to > > > > > > > > > me > > > > > > > > > > > to > > > > > > > > > > > >> > support > > > > > > > > > > > >> > > > > > this execution mode for Java and SQL as > > well. > > > > > > > > > > > >> > > > > > - Supporting many more data types natively > > > > (images, > > > > > > > > video, > > > > > > > > > > > audio, > > > > > > > > > > > >> > > > tensors) > > > > > > > > > > > >> > > > > > will make connector serializers and > > > > deserializers > > > > > > > > (SerDes) > > > > > > > > > > > much > > > > > > > > > > > >> > more > > > > > > > > > > > >> > > > > > challenging to implement. Even today, many > > > > SerDes in > > > > > > > > > > > officially > > > > > > > > > > > >> > > > supported > > > > > > > > > > > >> > > > > > connectors don't fully implement types > like > > > > arrays > > > > > > and > > > > > > > > > > > structs. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > Thank you. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > On Wed, Apr 29, 2026 at 1:18 AM Guowei Ma > < > > > > > > > > > > > [email protected]> > > > > > > > > > > > >> > > > wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > Hi Z > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > Thanks for the kind words and the > > thoughtful > > > > > > > > questions. > > > > > > > > > > Let > > > > > > > > > > > me > > > > > > > > > > > >> > take > > > > > > > > > > > >> > > > them > > > > > > > > > > > >> > > > > > > one by one. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > 1. Throughput and latency targets > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > To be honest, I don't have concrete > > numbers to > > > > > > share > > > > > > > > > yet. > > > > > > > > > > > What > > > > > > > > > > > >> I > > > > > > > > > > > >> > can > > > > > > > > > > > >> > > > say > > > > > > > > > > > >> > > > > > is > > > > > > > > > > > >> > > > > > > that our internal testing has already > > surfaced > > > > > > > several > > > > > > > > > > > >> directions > > > > > > > > > > > >> > > > where > > > > > > > > > > > >> > > > > > > Flink can be improved, and at the same > > time we > > > > > > want > > > > > > > to > > > > > > > > > > fully > > > > > > > > > > > >> > leverage > > > > > > > > > > > >> > > > > > > Flink's existing streaming shuffle > > > > capabilities. > > > > > > As > > > > > > > > the > > > > > > > > > > > >> > multimodal > > > > > > > > > > > >> > > > > > operator > > > > > > > > > > > >> > > > > > > library matures, we'll progressively > > publish > > > > > > > benchmark > > > > > > > > > > > results. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > 2. Built-in operators > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > You're absolutely right. From what I've > > seen, > > > > our > > > > > > > > > internal > > > > > > > > > > > >> users > > > > > > > > > > > >> > > > already > > > > > > > > > > > >> > > > > > > rely on a fairly large set of multimodal > > > > > > operators — > > > > > > > > > > > >> potentially > > > > > > > > > > > >> > > > 100+. > > > > > > > > > > > >> > > > > > The > > > > > > > > > > > >> > > > > > > exact set the community should provide > is > > best > > > > > > > > discussed > > > > > > > > > > in > > > > > > > > > > > >> > FLIP-XXX: > > > > > > > > > > > >> > > > > > > Built-in Multimodal Operators and AI > > > > Functions, > > > > > > and > > > > > > > > > > > >> contributions > > > > > > > > > > > >> > > > from > > > > > > > > > > > >> > > > > > the > > > > > > > > > > > >> > > > > > > community are very welcome there. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > 3. Plan for the 11 sub-FLIPs > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > The sequencing follows the layering in > the > > > > > > umbrella: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > - Layer 1 (Core Primitives) should be > > > > discussed > > > > > > > and > > > > > > > > > > > aligned > > > > > > > > > > > >> > first, > > > > > > > > > > > >> > > > > > since > > > > > > > > > > > >> > > > > > > the second and third layers build on > > it. > > > > > > > > > > > >> > > > > > > - Layer 2 (API + compilation + > > single-node > > > > > > > > execution) > > > > > > > > > > > starts > > > > > > > > > > > >> > with > > > > > > > > > > > >> > > > > > > getting the API discussion right — > the > > > > Python > > > > > > > API, > > > > > > > > > how > > > > > > > > > > > UDFs > > > > > > > > > > > >> > > > declare > > > > > > > > > > > >> > > > > > > resources, etc. — after which the > > > > single-node > > > > > > > > > execution > > > > > > > > > > > work > > > > > > > > > > > >> > can > > > > > > > > > > > >> > > > build > > > > > > > > > > > >> > > > > > > on > > > > > > > > > > > >> > > > > > > top. > > > > > > > > > > > >> > > > > > > - Layer 3 (distributed scheduling and > > > > > > > > checkpointing) > > > > > > > > > > can > > > > > > > > > > > >> > largely > > > > > > > > > > > >> > > > > > proceed > > > > > > > > > > > >> > > > > > > independently in parallel. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > So while each sub-FLIP is indeed a > > substantial > > > > > > piece > > > > > > > > of > > > > > > > > > > > work, > > > > > > > > > > > >> > most of > > > > > > > > > > > >> > > > > > them > > > > > > > > > > > >> > > > > > > can be advanced in parallel by different > > > > > > > contributors > > > > > > > > > once > > > > > > > > > > > the > > > > > > > > > > > >> > Layer > > > > > > > > > > > >> > > > 1 > > > > > > > > > > > >> > > > > > > primitives are settled. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > 4. GPU scheduling roadmap > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > Could you expand a bit on which aspect > of > > GPU > > > > > > > > scheduling > > > > > > > > > > you > > > > > > > > > > > >> > have in > > > > > > > > > > > >> > > > mind > > > > > > > > > > > >> > > > > > > as the complex one? "GPU scheduling" > > covers a > > > > > > fairly > > > > > > > > > wide > > > > > > > > > > > >> surface > > > > > > > > > > > >> > > > area > > > > > > > > > > > >> > > > > > > (resource declaration, operator-level > > > > deployment, > > > > > > > > > elastic > > > > > > > > > > > >> > scaling, > > > > > > > > > > > >> > > > > > > heterogeneous GPU types, fine-grained > > > > > > partitioning, > > > > > > > > > etc.), > > > > > > > > > > > and > > > > > > > > > > > >> > the > > > > > > > > > > > >> > > > answer > > > > > > > > > > > >> > > > > > > differs quite a bit depending on which > > > > dimension > > > > > > > we're > > > > > > > > > > > >> > discussing. > > > > > > > > > > > >> > > > Once I > > > > > > > > > > > >> > > > > > > understand your specific concern I can > > give a > > > > more > > > > > > > > > useful > > > > > > > > > > > >> > response. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > Thanks again for the support — looking > > > > forward to > > > > > > > the > > > > > > > > > > > continued > > > > > > > > > > > >> > > > > > discussion. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > Best, > > > > > > > > > > > >> > > > > > > Guowei > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > On Tue, Apr 28, 2026 at 4:34 PM zl z < > > > > > > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > > > >> > wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > Hey Guowei, > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > Thanks for the proposal, and I think > > this is > > > > > > very > > > > > > > > > > > valuable. I > > > > > > > > > > > >> > have > > > > > > > > > > > >> > > > some > > > > > > > > > > > >> > > > > > > > question about it: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > 1. What are our expected throughput > and > > > > latency > > > > > > > > > targets? > > > > > > > > > > > Do > > > > > > > > > > > >> we > > > > > > > > > > > >> > > > have any > > > > > > > > > > > >> > > > > > > > forward-looking tests for this? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > 2. AI involves a very large number of > > > > operators. > > > > > > > > > Besides > > > > > > > > > > > >> > allowing > > > > > > > > > > > >> > > > users > > > > > > > > > > > >> > > > > > > to > > > > > > > > > > > >> > > > > > > > use them through UDFs, will we also > > provide > > > > > > > commonly > > > > > > > > > > used > > > > > > > > > > > >> > built-in > > > > > > > > > > > >> > > > > > > > operators? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > 3. Each of the 11 sub-FLIPs is a major > > > > project > > > > > > > > > > involving a > > > > > > > > > > > >> > > > significant > > > > > > > > > > > >> > > > > > > > amount of changes. What is our plan > for > > > > this? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > 4. GPU scheduling is extremely > complex. > > > > What is > > > > > > > our > > > > > > > > > > > current > > > > > > > > > > > >> > > > roadmap for > > > > > > > > > > > >> > > > > > > > this? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > This is a very high-quality and > exciting > > > > > > proposal. > > > > > > > > > > Making > > > > > > > > > > > >> > Flink an > > > > > > > > > > > >> > > > > > > > AI-native data processing engine will > > make > > > > it > > > > > > far > > > > > > > > more > > > > > > > > > > > >> > valuable in > > > > > > > > > > > >> > > > the > > > > > > > > > > > >> > > > > > AI > > > > > > > > > > > >> > > > > > > > era. Look forward to seeing it land > and > > > > come to > > > > > > > > > fruition > > > > > > > > > > > >> soon. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > Robert Metzger <[email protected]> > > > > > > > 于2026年4月28日周二 > > > > > > > > > > > 14:38写道: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > Hey Guowei, > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > Thanks for the proposal. I just > took a > > > > brief > > > > > > > look, > > > > > > > > > > here > > > > > > > > > > > are > > > > > > > > > > > >> > some > > > > > > > > > > > >> > > > high > > > > > > > > > > > >> > > > > > > > level > > > > > > > > > > > >> > > > > > > > > questions: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > Regarding the RPC Operator: What is > > the > > > > > > > difference > > > > > > > > > to > > > > > > > > > > > the > > > > > > > > > > > >> > async > > > > > > > > > > > >> > > > io > > > > > > > > > > > >> > > > > > > > operator > > > > > > > > > > > >> > > > > > > > > we have already? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > "Connector API for Multimodal Data > > > > > > Source/Sink": > > > > > > > > Why > > > > > > > > > > do > > > > > > > > > > > we > > > > > > > > > > > >> > need > > > > > > > > > > > >> > > > to > > > > > > > > > > > >> > > > > > > touch > > > > > > > > > > > >> > > > > > > > > the connector API for supporting > > > > multimodal > > > > > > > data? > > > > > > > > > > Isn't > > > > > > > > > > > >> this > > > > > > > > > > > >> > > > more of > > > > > > > > > > > >> > > > > > a > > > > > > > > > > > >> > > > > > > > > formats concern? > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > "Non-Disruptive Scaling for CPU > > > > Operators": > > > > > > How > > > > > > > do > > > > > > > > > you > > > > > > > > > > > want > > > > > > > > > > > >> > to > > > > > > > > > > > >> > > > > > > guarantee > > > > > > > > > > > >> > > > > > > > > exactly-once on that kind of > scaling? > > > > E.g. you > > > > > > > > need > > > > > > > > > to > > > > > > > > > > > >> > somehow > > > > > > > > > > > >> > > > make a > > > > > > > > > > > >> > > > > > > > > handover between the old and new new > > > > pipeline > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > Overall, I find the proposal has > some > > > > things > > > > > > > which > > > > > > > > > > seem > > > > > > > > > > > >> > related > > > > > > > > > > > >> > > > to > > > > > > > > > > > >> > > > > > > making > > > > > > > > > > > >> > > > > > > > > Flink more AI native, but other > > changes > > > > seem > > > > > > > > > > orthogonal > > > > > > > > > > > to > > > > > > > > > > > >> > that. > > > > > > > > > > > >> > > > For > > > > > > > > > > > >> > > > > > > > > example the checkpoint or scaling > > changes > > > > are > > > > > > > > > actually > > > > > > > > > > > >> > unrelated > > > > > > > > > > > >> > > > to > > > > > > > > > > > >> > > > > > AI, > > > > > > > > > > > >> > > > > > > > and > > > > > > > > > > > >> > > > > > > > > just engine improvements. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > On Tue, Apr 28, 2026 at 5:48 AM > > Guowei Ma > > > > < > > > > > > > > > > > >> > [email protected]> > > > > > > > > > > > >> > > > > > > wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > Hi everyone, > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > I'd like to start a discussion on > an > > > > > > umbrella > > > > > > > > > > FLIP[1] > > > > > > > > > > > >> that > > > > > > > > > > > >> > lays > > > > > > > > > > > >> > > > > > out a > > > > > > > > > > > >> > > > > > > > > > direction for evolving Flink into > a > > data > > > > > > > engine > > > > > > > > > that > > > > > > > > > > > >> > natively > > > > > > > > > > > >> > > > > > > supports > > > > > > > > > > > >> > > > > > > > AI > > > > > > > > > > > >> > > > > > > > > > workloads. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > The short version: user workloads > > are > > > > > > shifting > > > > > > > > > from > > > > > > > > > > BI > > > > > > > > > > > >> > > > analytics to > > > > > > > > > > > >> > > > > > > > > > multimodal data processing > centered > > on > > > > model > > > > > > > > > > > inference, > > > > > > > > > > > >> and > > > > > > > > > > > >> > > > this > > > > > > > > > > > >> > > > > > > > triggers > > > > > > > > > > > >> > > > > > > > > > cascading changes across the > stack — > > > > > > > multimodal > > > > > > > > > data > > > > > > > > > > > >> > flowing > > > > > > > > > > > >> > > > > > through > > > > > > > > > > > >> > > > > > > > > > pipelines, heterogeneous CPU/GPU > > > > resources, > > > > > > > > > > vectorized > > > > > > > > > > > >> > > > execution, > > > > > > > > > > > >> > > > > > and > > > > > > > > > > > >> > > > > > > > > > inference tasks that run for > > seconds to > > > > > > > minutes > > > > > > > > on > > > > > > > > > > > Spot > > > > > > > > > > > >> > > > instances. > > > > > > > > > > > >> > > > > > > The > > > > > > > > > > > >> > > > > > > > > > proposal sketches an evolution > along > > > > five > > > > > > > > > directions > > > > > > > > > > > >> > > > (development > > > > > > > > > > > >> > > > > > > > > paradigm, > > > > > > > > > > > >> > > > > > > > > > data model, heterogeneous > resources, > > > > > > execution > > > > > > > > > > engine, > > > > > > > > > > > >> > fault > > > > > > > > > > > >> > > > > > > > tolerance), > > > > > > > > > > > >> > > > > > > > > > decomposed into 11 sub-FLIPs > > organized > > > > into > > > > > > > > three > > > > > > > > > > > layers: > > > > > > > > > > > >> > core > > > > > > > > > > > >> > > > > > > runtime > > > > > > > > > > > >> > > > > > > > > > primitives, AI workload expression > > and > > > > > > > > execution, > > > > > > > > > > and > > > > > > > > > > > >> > > > > > > production-grade > > > > > > > > > > > >> > > > > > > > > > operational guarantees. Most > > sub-FLIPs > > > > have > > > > > > no > > > > > > > > > hard > > > > > > > > > > > >> > > > dependencies on > > > > > > > > > > > >> > > > > > > > each > > > > > > > > > > > >> > > > > > > > > > other and can be advanced in > > parallel. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > A note on scope, since it's an > > umbrella: > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > - In scope here: whether the > > evolution > > > > > > > > directions > > > > > > > > > > are > > > > > > > > > > > >> > > > reasonable, > > > > > > > > > > > >> > > > > > > > whether > > > > > > > > > > > >> > > > > > > > > > each sub-FLIP's motivation and > > proposed > > > > > > > approach > > > > > > > > > are > > > > > > > > > > > >> > > > well-founded, > > > > > > > > > > > >> > > > > > > and > > > > > > > > > > > >> > > > > > > > > > whether the boundaries and > > dependencies > > > > > > > between > > > > > > > > > > > sub-FLIPs > > > > > > > > > > > >> > are > > > > > > > > > > > >> > > > > > clear. > > > > > > > > > > > >> > > > > > > > > > - Out of scope here: detailed > > designs, > > > > API > > > > > > > > > > specifics, > > > > > > > > > > > and > > > > > > > > > > > >> > > > > > > > implementation > > > > > > > > > > > >> > > > > > > > > > plans of individual sub-FLIPs — > > those > > > > will > > > > > > go > > > > > > > > > > through > > > > > > > > > > > >> > their own > > > > > > > > > > > >> > > > > > > FLIPs. > > > > > > > > > > > >> > > > > > > > > > - Consensus criteria: agreement on > > the > > > > > > overall > > > > > > > > > > > direction > > > > > > > > > > > >> is > > > > > > > > > > > >> > > > > > > sufficient > > > > > > > > > > > >> > > > > > > > > for > > > > > > > > > > > >> > > > > > > > > > the umbrella to pass; passing it > > does > > > > not > > > > > > lock > > > > > > > > in > > > > > > > > > > any > > > > > > > > > > > >> > > > sub-FLIP's > > > > > > > > > > > >> > > > > > > > design — > > > > > > > > > > > >> > > > > > > > > > sub-FLIPs may still be adjusted, > > > > deferred, > > > > > > or > > > > > > > > > > > withdrawn > > > > > > > > > > > >> as > > > > > > > > > > > >> > they > > > > > > > > > > > >> > > > > > > > progress. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > All proposed changes are > > incremental — > > > > no > > > > > > > > existing > > > > > > > > > > > API or > > > > > > > > > > > >> > > > behavior > > > > > > > > > > > >> > > > > > is > > > > > > > > > > > >> > > > > > > > > > removed or altered. Compatibility > > > > details > > > > > > are > > > > > > > > > > covered > > > > > > > > > > > at > > > > > > > > > > > >> > the > > > > > > > > > > > >> > > > end of > > > > > > > > > > > >> > > > > > > the > > > > > > > > > > > >> > > > > > > > > > document. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > Looking forward to your feedback > on > > the > > > > > > > overall > > > > > > > > > > > direction > > > > > > > > > > > >> > and > > > > > > > > > > > >> > > > the > > > > > > > > > > > >> > > > > > > > > layering. > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > [1] > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275 > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > Thanks, > > > > > > > > > > > >> > > > > > > > > > Guowei > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
