Hi Guowei and all, Big +1 on the overall direction of FLIP-577.
>From the SQL ecosystem perspective, over the past year the community has introduced AI functions (e.g., ML_PREDICT, VECTOR_SEARCH) to help users leverage Flink in AI scenarios. Through community interactions and enterprise user feedback since then, we can strongly sense the growing expectation for Flink to go further — broader multimodal data processing capabilities, improved runtime stability under AI workloads, and more optimized execution for inference-heavy pipelines. This FLIP addresses exactly those gaps. In particular, I'm excited about: - Multimodal data type extensions — first-class types like Tensor, Image, Embedding, and Audio in the type system will unlock native SQL/Table expressions over multimodal data, rather than forcing users to work around opaque BYTES columns. - Object reference mechanism — a critical optimization for large-payload multimodal data. Passing multi-megabyte blobs through the standard data path is a known pain point today; a reference-based approach will significantly reduce serialization overhead and memory pressure. - Richer built-in AI functions — standardizing common multimodal operations (embedding generation, similarity search, model inference) as engine-level functions will eliminate the repeated ad-hoc UDF implementations we see across organizations today, and open the door for engine-level optimizations like shared model handles and adaptive batching. As Jark mentioned, we'd like to participate in and contribute to the relevant sub-FLIPs. Best, Lincoln Lee Guowei Ma <[email protected]> 于2026年5月10日周日 21:26写道: > Hi all, > > Thanks again for the thoughtful feedback and the valuable perspectives > shared in this discussion. > > I have updated FLIP-577 [1] based on the discussion in this thread. The > overall direction remains the same, but I have tried to make the scope, > motivation, and boundaries clearer. > > The main changes are: > > 1. > > Clarified the target workload as AI-oriented data processing workloads, > instead of relying only on the broader "AI-Native" wording. > 2. > > Added explicit non-goals to make clear that this proposal does not aim > to turn Flink into an AI framework, ML platform, or model serving > system. > 3. > > Added "Why Now" and "Why Flink" sections to better explain the > production signals, ecosystem trends, and why Flink's existing runtime > strengths are relevant here. > 4. > > Reworked the umbrella rationale. The key point is not that every > sub-FLIP is AI-specific, but that these mechanisms need a shared runtime > contract across data representation, service invocation, GPU resources, > scaling, and recovery. > 5. > > Clarified that engine-level primitives should consider SQL/Table, Java > DataStream, and Python DataFrame. > 6. > > Made the initial correctness scope of runtime mechanisms — > non-disruptive scaling, UAC enhancements, and Pipeline Region > checkpoints — > more conservative, with explicit opt-in where default behavior is > affected. > > I also tried to reflect the earlier questions raised in this thread in the > corresponding sections of the updated document. > > Looking forward to the continued discussion. > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275 > > Best, > Guowei > > > On Sun, May 10, 2026 at 4:19 PM Xintong Song <[email protected]> > wrote: > > > Hi all, > > > > First, big thanks to Guowei for kicking off this important discussion, > and > > to the community for the substantive engagement over the past days. I'd > > also like to share some of my own thoughts. > > > > My apologies for joining late — I've been tied up with other things and > > only recently started catching up. Given the length of this thread, it's > > possible I haven't fully digested every detail. If I miss anything, > please > > bear with me and feel free to point it out. > > > > Let me state my position upfront: a big +1 to this FLIP. Based on the > > workload evolution I've been observing, I believe moving Flink toward > > AI-Native is necessary — this is a project- and community-level effort > that > > matters deeply for how Flink responds to the industry shift driven by AI, > > and how it captures the new opportunities that come with it. We may still > > need more discussion to align on specific technical details, but at the > > strategic level of where the community should head, I strongly support > this > > umbrella. > > > > Below are my thoughts on a few key questions raised in the discussion so > > far. > > > > > > 1. On the scope of the umbrella > > > > Robert, Yaroslav, and Martijn have all pointed out that some items in the > > FLIP — Pipeline Region independent checkpointing, UAC enhancements, > > non-disruptive scaling, etc. — are not unique to AI scenarios and would > > also apply elsewhere. That observation is fair. > > > > However, I'd argue the right criterion for whether something belongs > under > > this umbrella is not whether it is *exclusively* used for AI, but whether > > it is *essential* to the central goal here — AI-Native Flink — i.e., > > whether AI-Native Flink would actually fall short without it. > > > > From that angle, I agree with what Guowei has already laid out: under AI > > workload characteristics (per-record processing cost orders of magnitude > > higher, GPU resources expensive and scarce, long and long-tailed > > asynchronous inference), these otherwise-general mechanisms get pushed to > > the limit of their current implementations. They may be adequate for > > traditional BI/ETL workloads, but they become real bottlenecks under AI > > workloads. Without them, the AI-Native Flink story would be incomplete. > So > > I support including them under this umbrella. > > > > > > 2. On usage and maintenance complexity > > > > I understand the concerns Yaroslav and Martijn raised about complexity > > costs. That said, I don't think we should conservatively reject a > > capability because of complexity, if that capability is critical and > brings > > substantial benefits. On the necessity of RpcOperator specifically, Zhu > > Zhu, Guowei, and others have already provided answers earlier in this > > thread, and I'll add some context from the Flink Agents side below. > > > > As a side observation: with the rapid progress of AI-assisted development > > over the past year or two, the way engineers work is shifting in a > > noticeable way. On one hand, AI coding tools are helping developers > > maintain complex software systems at lower effort. On the other hand, > more > > and more users are starting to ask AI how to use Flink correctly, and > even > > letting AI help them develop and operate Flink workloads. Usage and > > maintenance complexity still matter, but their relative weight is > arguably > > going down — this is a trend worth factoring into long-term tradeoffs at > > the community level. > > > > > > 3. On "wait for the industry to stabilize before integrating" > > > > Yaroslav raised a concern about the timing — the LLM tooling landscape is > > changing fast, and it's hard to predict what will be needed tomorrow. I'd > > offer a different perspective. > > > > It's *precisely because* the industry is evolving fast and hasn't yet > > settled into a stable shape that Flink should join earlier — to influence > > and help define how that shape forms. If we just stand by waiting for the > > landscape to settle, by the time it does, there may not be a place for > > Flink in it, and we'd have missed an important window. > > > > In fact, projects like Daft and Ray Data are increasingly defining the de > > facto standards in this space, as Leonard pointed out earlier in this > > thread. If Flink doesn't engage proactively, even keeping up later will > put > > us in a reactive position. > > > > Engaging early does carry some cost of trial and error — some > capabilities > > may need to be deprecated or restructured down the line. But I'd argue > this > > is a normal part of how open-source projects evolve, and it's both > > necessary and worth it. > > > > > > 4. RpcOperator from a Flink Agents perspective > > > > The benefits of RpcOperator have already been articulated by several > people > > earlier in the thread. I'd like to add a concrete scenario from the Flink > > Agents side — the subagent pattern. > > > > > > 4.1 Brief intro to the subagent pattern > > > > In agent practice, after planning, the main agent often spawns one or > more > > subagents to handle specific subtasks. Each subagent has its own LLM > > context — its own system prompt, its own context window, its own toolset > > and permissions. The main agent only receives the final result from the > > subagent, not the intermediate steps. This pattern is now widely adopted > in > > the industry, and many agent skills explicitly say "spawn a subagent to > do > > X". > > > > Flink Agents wants to support this pattern for several reasons: > > compatibility with the existing agent skill ecosystem; context isolation, > > so that intermediate state during subagent execution doesn't pollute the > > main agent's context; and one additional advantage that enterprise > > production scenarios have over single-machine personal-assistant agents — > > the ability to dispatch resource-intensive subtasks to a shared resource > > pool for efficient resource reuse. > > > > > > 4.2 Why this needs RpcOperator instead of in-place execution inside the > > Flink Agents operator > > > > Why not just spawn subagents directly inside the Flink Agents operator? > > Because subagent workloads are highly bursty — within the same job, one > > minute's input might need several subagents for deep processing; the next > > minute's input might need none. Under that kind of randomness, statically > > allocating subagent resources per operator parallelism leaves no good > > answer: under-allocate and the operator gets blown out the moment demand > > arrives; over-allocate and resources sit idle most of the time. > > > > The natural answer is to put subagent execution into a shared resource > pool > > that can be used by multiple upstream operator subtasks, and that can > scale > > elastically based on load. This maps directly onto the design and core > > benefits of RpcOperator that Zhu Zhu laid out earlier — each RpcOperator > > instance forming its own Pipelined Region, which in turn enables > > independent scaling and flexible load balancing. > > > > > > 4.3 Why this shared resource pool should be inside Flink rather than > > deployed externally > > > > Two reasons. > > > > First, the prompts, toolsets, and permission configurations a subagent > > needs at execution time are essentially part of the Flink Agents *job > > definition*. Stripping them out of the job and deploying them separately > > effectively splits the job definition in two, leaving users to maintain a > > synchronization relationship between an external deployment and the Flink > > job. This goes against the developer experience Flink Agents aims to > > provide. > > > > Second, subagent execution needs to be incorporated into Flink's > checkpoint > > mechanism. The notion of "stateful" here deserves a brief clarification: > > each subagent task is self-contained — the full context it needs is > handed > > over by the main agent at dispatch time in one shot. That is, subagents > > don't need to share state across records, so at the task level they look > > stateless. However, executing a single subagent task takes a significant > > amount of time and involves multiple rounds of model calls and tool > calls, > > where tool calls may have externally observable side effects (sending > > emails, writing to external systems, etc.). This means we need failure > > recovery for the in-flight computation state during task execution, so > that > > subagent execution preserves exactly-once semantics across failovers. > > > > > > To wrap up, let me reiterate my support for the overall direction of this > > FLIP. There are clearly technical details still to be worked out, but I > > believe the direction described by this umbrella is both necessary and > > important for the long-term evolution of the Flink community. Looking > > forward to continued discussion in the sub-FLIPs ahead. > > > > Thanks! > > > > Best, > > > > Xintong > > > > > > > > On Fri, May 8, 2026 at 3:43 PM Guowei Ma <[email protected]> wrote: > > > > > Hi all, > > > > > > Thanks everyone for the in-depth discussion over the past few days. Let > > me > > > first summarize my understanding of the discussion so far. I see strong > > > interest in making Flink better support AI-oriented data processing > > > workloads, especially multimodal and inference-oriented pipelines. At > the > > > same time, a recurring concern is that RpcOperator and some of the > Layer > > 3 > > > runtime improvements also look like general Flink capabilities, so why > > > should they be discussed together under the AI-Native / multimodal > > > processing umbrella? > > > > > > I think the key question is not whether these mechanisms can only be > used > > > for AI scenarios, but whether they jointly form the runtime contract > > Flink > > > needs to support AI-oriented data processing workloads, especially > > > multimodal inference pipelines. Capabilities such as RpcOperator, > > > checkpointing, scaling, and resource management can certainly serve > > broader > > > use cases. However, under this class of workloads, they require more > > > consistent runtime semantics and coordinated design, and together > > determine > > > whether the system can provide production-grade execution, recovery, > and > > > elasticity. > > > > > > Compared with traditional streaming workloads, this class of workloads > > > changes the data shape, computation pattern, and resource model > > > significantly. Data is no longer only small structured records; it may > be > > > images, video, audio, tensors, embeddings, or object references to > > external > > > large objects. Many pipelines also have a relatively shuffle-light > shape, > > > such as URI → preprocessing → inference → sink. The computation logic > > often > > > includes model inference or service-style invocation, either as remote > > > inference service invocation or local GPU / accelerator-backed > execution. > > > Therefore, the system needs to handle not only ordinary RPC calls, but > > also > > > service discovery, backpressure propagation, batching / concurrency > > > control, timeout / retry, in-flight request draining, model loading, > GPU > > > warmup, resource scheduling, and fault recovery. > > > > > > From this perspective, RpcOperator is not just “another way to call an > > > external service,” nor is it merely a deployment mechanism for GPU > > > operators. More importantly, it defines a service-style operator > > > abstraction: when inference becomes part of the logical data flow, > Flink > > > needs to understand and coordinate these runtime semantics, rather than > > > hiding inference completely behind external black-box calls inside user > > > code. > > > > > > Some of the Layer 3 runtime improvements follow the same logic. While > > > mechanisms such as checkpointing or scaling are not exclusive to AI > > > workloads, inference-oriented workloads fundamentally change their > > > operational assumptions and cost model, making runtime behavior far > more > > > critical than in traditional data processing systems. When per-record > > > computation is expensive, GPU warmup and model loading are costly, and > > the > > > execution environment may involve elastic / preemptible resources, the > > cost > > > of global rollback or disruptive scaling becomes much higher than in > > > traditional row-at-a-time BI / ETL workloads. As a result, runtime > > behavior > > > that may have been acceptable for traditional workloads can directly > > affect > > > stability and resource efficiency in inference-oriented workloads. > > > > > > At the same time, the umbrella proposal helps provide a shared context > > for > > > discussing how these capabilities relate to each other and what common > > > runtime assumptions they rely on. The more important value of the > > umbrella > > > is to align on the workload model, design principles, boundaries, and > > > dependencies between capabilities, so that independently evolving > pieces > > > such as RpcOperator, GPU resource declaration, batching / concurrency > > > control, non-disruptive scaling, and regional checkpointing do not end > up > > > with inconsistent runtime semantics. > > > > > > Based on this discussion, I will update the proposal to make the > workload > > > model, RpcOperator boundary, and Layer 3 dependency relationship > clearer. > > > > > > Best, > > > Guowei > > > Best, > > > Guowei > > > > > > > > > On Fri, May 8, 2026 at 12:37 PM zhangjiaogg <[email protected]> > wrote: > > > > > > > Hi Guowei and all, > > > > > > > > Thank you for driving this initiative. Strong +1 on the overall > > > direction. > > > > > > > > From our perspective, the core value of this proposal lies in two > > areas: > > > > extending Flink's intelligent processing capabilities for multimodal > > > data, > > > > and enabling native, in-pipeline local inference. As AI capabilities > > > > continue to advance, multimodal data is accounting for an > increasingly > > > > large share of overall data volume, and the ability to perform > > > intelligent, > > > > real-time processing on this data — not just ingestion or routing, > but > > > > actual inference and transformation within the stream — is becoming a > > > > critical requirement across industries. Today, most pipelines treat > > > > multimodal objects as opaque blobs and push inference to external > > > systems, > > > > which works but at the cost of complexity, latency, and consistency. > > > > > > > > This is exactly the pain one of our customers experiences in their > > > > autonomous driving data pipelines. Video and image data captured by > > > onboard > > > > cameras must go through annotation, frame extraction, quality > > filtering, > > > > and both unstructured and structured data transformation before it > can > > be > > > > used for model training — a workflow that today requires combining > > > multiple > > > > specialized systems: a stream engine for structured processing, a > > > separate > > > > framework (e.g., Ray Data) for multimodal processing, and an external > > > model > > > > serving layer for inference. Each system boundary introduces > > intermediate > > > > storage, operational overhead, and data consistency challenges across > > the > > > > full pipeline. If Flink can handle this end-to-end — with native > > > multimodal > > > > types, local GPU inference operators, and unified checkpointing — the > > > > entire workflow becomes a single Flink job, intermediate storage is > > > > eliminated, and fault recovery covers the pipeline as a whole. > > > > > > > > We look forward to the sub-FLIP discussions and would be happy to > > > > contribute. > > > > > > > > Best regards > > > > Jiao Zhang > > > > > > > > At 2026-05-07 12:58:15, "zihao chen" <[email protected]> wrote: > > > > >Hi, all, > > > > > > > > > >I’d like to share some thoughts based on our internal experience > > > > >with AI workloads on Flink. > > > > > > > > > >At Tencent, we have production scenarios where Flink is used in > > > > >AI-related pipelines. > > > > > > > > > >Based on these workloads, we explored elasticity and autoscaling > > > > >for cloud-native stream processing systems and published our > > > > >experience in SIGMOD 2025: > > > > > > > > > >"Oceanus: Enable SLO-Aware Vertical Autoscaling for > > > > >Cloud-Native Streaming Services in Tencent" [1] > > > > > > > > > >As our workloads evolved, we also started to see increasing > > > > >GPU-based training and inference scenarios. > > > > > > > > > >Our current solution integrates Flink with external GPU services. > > > > >While this works functionally, it also introduces several practical > > > > >issues, such as: > > > > > > > > > > - > > > > > > > > > > fragmented lifecycle management > > > > > - > > > > > > > > > > operational complexity > > > > > - > > > > > > > > > > inconsistent scaling/recovery behavior across systems > > > > > > > > > >From this perspective, I think FLIP-577 is exploring a very > > > > >meaningful direction. > > > > > > > > > >In particular, I agree with the idea of integrating GPU-backed > > > > >computation more naturally into Flink’s runtime model, instead of > > > > >treating it purely as an external service integration problem. > > > > > > > > > >Besides, from the elasticity perspective, our experience is that > > > > >GPU workloads have very different characteristics compared with > > > > >traditional CPU workloads: > > > > > > > > > > - > > > > > > > > > > GPU resources are expensive and scarce > > > > > - > > > > > > > > > > Startup and replay costs are significantly higher > > > > > - > > > > > > > > > > Long-running tasks make scaling and recovery more challenging > > > > > > > > > >In our experience, GPU elasticity cannot simply reuse the > > > > >assumptions behind traditional CPU elasticity. > > > > > > > > > >Because of this, elasticity becomes especially important for > > > > >production AI workloads, not only for resource efficiency, but also > > > > >for reducing scaling and recovery overhead. > > > > > > > > > >More broadly, AI workloads increasingly require Flink to collaborate > > > > >more naturally with GPU-backed computation, and I believe > > > > >FLIP-577 is exploring an important direction toward addressing > > > > >this gap. > > > > > > > > > >Overall, I’m looking forward to further discussions about this FLIP. > > > > > > > > > >[1] https://dl.acm.org/doi/abs/10.1145/3722212.3724445 > > > > > > > > > > > > > > >Best, > > > > >Zihao Chen > > > > > > > > > >Yong Fang <[email protected]> 于2026年5月7日周四 11:11写道: > > > > > > > > > >> Hi devs, > > > > >> > > > > >> Thanks Guowei for initiating this proposal. I think this is an > > > important > > > > >> step for Flink towards the era of AI data processing, very big +1. > > > > >> > > > > >> I’d like to share some scenarios and requirements of leveraging > > > PyFlink > > > > for > > > > >> AI data processing at ByteDance. Currently, we run tens of > thousands > > > of > > > > >> pyflink/flink jobs, using millions of CPU cores. > > > > >> > > > > >> 1) Multimodal Data Processing > > > > >> We want to use PyFlink to generate multimodal feature data. The > > > typical > > > > >> workflow starts by reading ID-based raw data and performing large > > > table > > > > >> joins and ETL computations. We then fetch multimodal assets such > as > > > > images, > > > > >> videos, texts and audios from object storage by ID. These > multimodal > > > > data > > > > >> are either sent to an RPC service (backed by local models or > remote > > > > large > > > > >> models), or processed via local GPU computing for frame > extraction, > > > > >> embedding generation and other tasks. After multimodal > computation, > > > > output > > > > >> results including embeddings, processed images and multimodal > > metadata > > > > are > > > > >> generated and persisted into the downstream multimodal data lake. > > > > >> > > > > >> 2) Stream-Batch Unified Data Training > > > > >> We use PyFlink to consume processed sample data from MQ or data > > lakes. > > > > >> Within the data pipeline, data may be shuffled by key, then fed > into > > > > >> parameter servers or local services for CPU-based or GPU-based > model > > > > >> training. Such workloads strongly demand optimized CPU & GPU > hybrid > > > > >> scheduling, worker node restart capability, fast scaling, as well > as > > > > native > > > > >> support for unified streaming and batch training. > > > > >> > > > > >> While supporting the above two categories of workloads, we have > > > several > > > > >> common requirements for PyFlink and Flink Core: > > > > >> > > > > >> 1) Native Python Computing Capability > > > > >> We need more user-friendly DataFrame APIs, comprehensive built-in > > data > > > > >> types for image and audio, as well as richer multimodal computing > > > > >> operators. This enables users to develop multimodal data > processing > > > jobs > > > > >> more efficiently, and allows better optimization and scheduling at > > the > > > > >> execution plan layer. > > > > >> > > > > >> 2) CPU & GPU Scheduling and Resource Management > > > > >> It is expected to tag resource requirements at fine-grained levels > > > such > > > > as > > > > >> python user-defined functions. The scheduler should provide > enhanced > > > > >> resource orchestration and task scheduling, enabling more flexible > > > > >> heterogeneous resource management. > > > > >> > > > > >> 3) Embedded Server Node Capability in Pipeline > > > > >> We hope to launch dedicated server nodes inside a single pipeline, > > to > > > > load > > > > >> local models or access remote models, which may be shared between > > > > different > > > > >> operators in the pipeline. This unifies data ETL and multimodal > > > > computing > > > > >> within one end-to-end pipeline, greatly simplifying operation and > > > > >> maintenance for business teams. > > > > >> > > > > >> 4) Performance and Stability Optimization > > > > >> Key enhancements include zero-copy data transfer between Flink TM > > > > processes > > > > >> and Python processes, fast scaling of compute nodes, fast > > > checkpointing > > > > >> mechanism, and shuffle optimization for large-scale datasets. > These > > > > >> improvements will significantly boost the performance and > stability > > of > > > > >> PyFlink multimodal workloads. > > > > >> > > > > >> I’m really excited to see that FLIP-577 has covered all the real > > > > production > > > > >> scenarios and requirements mentioned above. It's a good chance to > > > > iterate > > > > >> and enrich the core capabilities of PyFlink and Flink Core > targeting > > > > these > > > > >> AI data processing scenarios, and build Flink into a first-class > AI > > > data > > > > >> processing engine. > > > > >> > > > > >> I’m very much looking forward to the progress. > > > > >> > > > > >> Best, > > > > >> FangYong > > > > >> > > > > >> On Wed, May 6, 2026 at 8:36 PM FeatZhang <[email protected]> > > > wrote: > > > > >> > > > > >> > Hi all, > > > > >> > > > > > >> > This is a great topic, and honestly long overdue. > > > > >> > > > > > >> > With the rapid growth of AI applications, we have been seeing a > > > > >> > significant increase in real-world demands from users who are > > > already > > > > >> > building on Flink and other traditional data processing or BI > > > engines. > > > > >> > From a platform perspective, more and more teams are trying to > > > > >> > integrate AI capabilities directly into their existing streaming > > > > >> > pipelines, rather than treating them as separate systems. > > > > >> > > > > > >> > This is not an isolated trend — it is becoming a common > > requirement > > > > >> > across industries. > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 1. What is changing in real systems > > > > >> > > > > > >> > We are observing a consistent shift: > > > > >> > > > > > >> > AI is moving from offline analysis or request-time scoring to > > > > >> > continuous, event-driven decision making. > > > > >> > > > > > >> > In other words: > > > > >> > > > > > >> > AI is becoming part of the data stream itself. > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 2. Representative production scenarios > > > > >> > > > > > >> > 2.1 Real-time fraud detection (per-event decision under strict > > > > latency) > > > > >> > > > > > >> > Typical setup > > > > >> > > > > > >> > Continuous transaction stream (payments, logins, transfers) > > > > >> > Each event must be evaluated within milliseconds > > > > >> > Decision depends on: > > > > >> > > > > > >> > recent user behavior > > > > >> > device / IP patterns > > > > >> > short-term aggregates > > > > >> > > > > > >> > What is happening in practice > > > > >> > > > > > >> > Models are already deeply integrated into decision flow > > > > >> > Feature freshness directly impacts detection accuracy > > > > >> > > > > > >> > Current pain > > > > >> > > > > > >> > Features computed in streaming, but inference is remote > > > > >> > Network overhead adds to critical path latency > > > > >> > Hard to ensure training/serving consistency > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 2.2 Real-time recommendation and ads (continuous re-ranking) > > > > >> > > > > > >> > Typical setup > > > > >> > > > > > >> > User interaction stream (click, view, dwell time) > > > > >> > Continuous feature updates (session + short-term behavior) > > > > >> > Inference triggered per interaction > > > > >> > > > > > >> > What is happening in practice > > > > >> > > > > > >> > Increasing reliance on real-time context > > > > >> > Model-based ranking becomes core logic > > > > >> > > > > > >> > Current pain > > > > >> > > > > > >> > Offline and online feature pipelines diverge > > > > >> > Training-serving skew is common > > > > >> > Inference orchestration is ad-hoc > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 2.3 Streaming RAG / knowledge systems (continuous indexing) > > > > >> > > > > > >> > Typical setup > > > > >> > > > > > >> > Continuous ingestion of documents, logs, or knowledge > > > > >> > Pipeline: > > > > >> > > > > > >> > chunking → embedding → indexing → retrieval > > > > >> > > > > > >> > Typical use cases > > > > >> > > > > > >> > AI copilots > > > > >> > enterprise knowledge assistants > > > > >> > observability systems > > > > >> > > > > > >> > Current pain > > > > >> > > > > > >> > Built via loosely coupled services or scripts > > > > >> > No strong consistency guarantees > > > > >> > Difficult to scale and recover > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 2.4 Real-time feedback loop (continuous evaluation) > > > > >> > > > > > >> > Typical setup > > > > >> > > > > > >> > Prediction at time T > > > > >> > Label arrives at T + Δ > > > > >> > > > > > >> > Required processing > > > > >> > > > > > >> > prediction stream JOIN label stream → metrics → optimization > > > > >> > > > > > >> > Current pain > > > > >> > > > > > >> > Alignment logic duplicated across systems > > > > >> > Late data handling is complex > > > > >> > No reusable evaluation abstraction > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 2.5 AI replacing rule-based decision logic > > > > >> > > > > > >> > Evolution > > > > >> > > > > > >> > From: > > > > >> > > > > > >> > rule engine / CEP > > > > >> > > > > > >> > To: > > > > >> > > > > > >> > model / LLM → decision > > > > >> > > > > > >> > Implication > > > > >> > > > > > >> > AI is becoming the core decision layer inside streaming systems. > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 3. Architectural shift > > > > >> > > > > > >> > Across all scenarios: > > > > >> > > > > > >> > From: > > > > >> > > > > > >> > data processing → feature system → model serving → evaluation > > > > >> > > > > > >> > To: > > > > >> > > > > > >> > stream = feature + inference + decision + feedback > > > > >> > > > > > >> > This reflects a fundamental change in system boundaries. > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 4. Why externalized architectures break down > > > > >> > > > > > >> > Most current implementations rely on multiple systems: > > > > >> > > > > > >> > stream processing > > > > >> > feature store > > > > >> > model serving > > > > >> > vector database > > > > >> > > > > > >> > This introduces several fundamental issues in real-time > scenarios. > > > > >> > > > > > >> > 4.1 Latency dominated by system boundaries > > > > >> > > > > > >> > stream → network → model service → response > > > > >> > > > > > >> > network overhead is unavoidable > > > > >> > batching is not controlled by the stream runtime > > > > >> > no end-to-end backpressure > > > > >> > > > > > >> > Latency becomes a system-level artifact rather than a compute > > > > property. > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 4.2 Inconsistent data between training and serving > > > > >> > > > > > >> > offline vs online features > > > > >> > different definitions or time windows > > > > >> > > > > > >> > Models operate on inconsistent data distributions. > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 4.3 State fragmentation > > > > >> > > > > > >> > user/session context must be rebuilt or fetched > > > > >> > loss of data locality > > > > >> > processing becomes call-driven > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 4.4 Feedback loop is not composable > > > > >> > > > > > >> > difficult alignment of prediction and label streams > > > > >> > no unified handling of late data > > > > >> > duplicated evaluation logic > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 4.5 Operational complexity > > > > >> > > > > > >> > multiple systems to scale > > > > >> > multiple failure domains > > > > >> > complex debugging paths > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 5. Why this aligns with Flink > > > > >> > > > > > >> > These workloads require: > > > > >> > > > > > >> > event-driven execution > > > > >> > strong state management > > > > >> > precise time semantics > > > > >> > continuous feedback > > > > >> > > > > > >> > These are exactly Flink’s core strengths. > > > > >> > > > > > >> > The key insight is: > > > > >> > > > > > >> > Inference should be modeled as a dataflow operator, not an > > external > > > > >> > service. > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 6. Implication > > > > >> > > > > > >> > If we model: > > > > >> > > > > > >> > data → feature → inference → decision → feedback > > > > >> > > > > > >> > within Flink, we can achieve: > > > > >> > > > > > >> > unified scheduling > > > > >> > shared state > > > > >> > consistent time semantics > > > > >> > end-to-end fault tolerance > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > 7. Conclusion > > > > >> > > > > > >> > This is not simply about adding AI support to Flink. > > > > >> > > > > > >> > It is about recognizing that: > > > > >> > > > > > >> > Real-time AI systems are fundamentally streaming systems. > > > > >> > > > > > >> > The question is whether Flink evolves to support this natively, > or > > > > >> > remains a preprocessing layer in front of external AI stacks. > > > > >> > > > > > >> > ________________________________ > > > > >> > > > > > >> > Happy to follow up with a more concrete proposal (e.g., > inference > > > > >> > operator abstraction) if there is interest. > > > > >> > > > > > >> > Thanks. > > > > >> > > > > > >> > On Mon, May 4, 2026 at 10:31 PM Gen Luo <[email protected]> > > > wrote: > > > > >> > > > > > > >> > > Hi all, > > > > >> > > > > > > >> > > Thank you Guowei Ma for driving this discussion, and thanks > > > everyone > > > > >> for > > > > >> > > the valuable insights. Inspired by this exchange, I’d like to > > > share > > > > a > > > > >> few > > > > >> > > thoughts. > > > > >> > > > > > > >> > > While “AI-Native” covers broad ground, I believe this FLIP > does > > > not > > > > >> > > overextend Flink’s scope. It’s a necessary iteration driven by > > > > evolving > > > > >> > > user scenarios and AI advancements, particularly multimodal > > > > processing. > > > > >> > > Given the growing adoption of multimodal applications and > > > increasing > > > > >> > > interest in low-latency inference, initiating these > enhancements > > > is > > > > a > > > > >> > > timely step to better align Flink with evolving AI workloads. > > > > >> > > > > > > >> > > From our engagements with customers and developers, we > observe a > > > > clear > > > > >> > > shift in both workloads and user expectations. Model inference > > is > > > > >> > > increasingly central to data pipelines, with multimodal AI > tasks > > > > >> growing > > > > >> > > rapidly. Traditional real-time scenarios (e.g., monitoring and > > > > >> analytics) > > > > >> > > now leverage models and agent frameworks like Flink Agent for > > > > >> > intelligent, > > > > >> > > multi-turn decision-making, while large-scale offline compute > is > > > > also > > > > >> > > shifting toward LLMs and vision models. Alongside this > workload > > > > >> > evolution, > > > > >> > > developer workflows have adapted: AI practitioners naturally > > > prefer > > > > >> > Python > > > > >> > > and DataFrame-style APIs. As AI-assisted coding matures, > > aligning > > > > >> system > > > > >> > > interfaces with these familiar patterns will directly improve > > > > >> > AI-generated > > > > >> > > code quality and significantly lower adoption barriers for the > > AI > > > > >> > community. > > > > >> > > > > > > >> > > Today, many AI evaluation tools don’t yet recommend Flink for > AI > > > > >> > > workloads—largely due to limited visibility of Flink’s > relevant > > > > >> > > capabilities rather than fundamental incompatibility. In > > reality, > > > > Flink > > > > >> > has > > > > >> > > unique strengths here. For example, generating multimodal > > samples > > > is > > > > >> > often > > > > >> > > a multi-day, GPU-heavy process. Flink’s streaming model, > > combined > > > > with > > > > >> > > checkpointing and reduced disk I/O, is well-suited for such > > > > >> long-running > > > > >> > > tasks—a direction also pursued by engines like Daft and Ray > > Data. > > > > With > > > > >> > > Flink’s proven production stability, we’re well-positioned for > > > both > > > > >> batch > > > > >> > > and future real-time multimodal streaming inference. Targeted > > > > >> > improvements > > > > >> > > can make these advantages visible, driving better user > > experiences > > > > and > > > > >> > > healthier ecosystem growth. > > > > >> > > > > > > >> > > I’d also note a lesson from FlinkML. It attempted to cover > model > > > > >> training > > > > >> > > but struggled to align with the fast-iteration, > > > > Python/notebook-centric > > > > >> > > workflows preferred by ML researchers. Flink’s core strength > > lies > > > in > > > > >> > > high-concurrency, production-grade inference orchestration—not > > > > training > > > > >> > > lifecycle management (e.g., experiment tracking, versioning). > > This > > > > >> > mismatch > > > > >> > > limited its adoption. > > > > >> > > > > > > >> > > This proposal, however, takes a different path. It doesn’t aim > > to > > > > >> replace > > > > >> > > training frameworks. Instead, it introduces modern AI concepts > > > > >> > (multimodal > > > > >> > > data, LLMs) as first-class citizens for inference, built atop > > > > Flink’s > > > > >> > > computation strengths. Think Ray Data’s scope (plus simple > > > > co-located > > > > >> > > serving), not Train/Tune. Crucially, unlike the FlinkML era, > > > today’s > > > > >> > models > > > > >> > > use standardized interfaces and mature serving frameworks, > > > allowing > > > > >> Flink > > > > >> > > to integrate external models seamlessly without heavy > > > > >> > > customization—significantly lowering project risk. > > > > >> > > > > > > >> > > This FLIP marks Flink’s another starting point for the AI era. > > > While > > > > >> > > details need refinement, I believe this direction aligns with > > both > > > > >> > current > > > > >> > > and future user needs and Flink’s evolution. > > > > >> > > > > > > >> > > Best, > > > > >> > > Gen > > > > >> > > > > > > >> > > On Mon, May 4, 2026 at 12:15 PM Jark Wu <[email protected]> > > wrote: > > > > >> > > > > > > >> > > > Hi Guowei, > > > > >> > > > > > > > >> > > > Thanks for driving this. +1 on the overall direction. > Flink's > > > > >> > > > streaming processing and checkpoint mechanism give it a > > > structural > > > > >> > > > advantage over systems like Daft and Ray. But today, these > > > runtime > > > > >> > > > strengths are held back by gaps in Python API, GPU > scheduling, > > > and > > > > >> > > > native multimodal data handling. This umbrella FLIP > addresses > > > > exactly > > > > >> > > > that gap, comprehensively and systematically. I believe > > > multimodal > > > > >> > > > data processing is the biggest opportunity for traditional > > data > > > > infra > > > > >> > > > to transition into AI infra, and this is one of the most > > > important > > > > >> > > > FLIPs for Flink in the AI era. > > > > >> > > > > > > > >> > > > As one of the Table/SQL module maintainers, we would like to > > > > >> > > > contribute the built-in multimodal processing UDFs (audio, > > > video, > > > > >> > > > image, text) and native multimodal data types (Tensor, > Image, > > > > >> > > > Embedding, etc.) as first-class citizens in the type system. > > > > Looking > > > > >> > > > forward to the sub-FLIP discussions. > > > > >> > > > > > > > >> > > > Best, > > > > >> > > > Jark > > > > >> > > > > > > > >> > > > On Thu, 30 Apr 2026 at 18:42, Guowei Ma < > [email protected] > > > > > > > >> wrote: > > > > >> > > > > > > > > >> > > > > Hi,Yaroslav > > > > >> > > > > > > > > >> > > > > Thanks for taking the time to write this detailed > feedback. > > > Let > > > > me > > > > >> > > > clarify > > > > >> > > > > the intent of the proposal first. > > > > >> > > > > > > > > >> > > > > I am not saying that Flink should become an AI framework, > an > > > ML > > > > >> > platform, > > > > >> > > > > or a model serving system. The way I use "AI-Native" in > this > > > > >> > proposal is > > > > >> > > > to > > > > >> > > > > say that Flink should support, as first-class citizens, > the > > > core > > > > >> > objects > > > > >> > > > > and execution patterns that frequently show up in > > AI-oriented > > > > data > > > > >> > > > > processing — instead of leaving them entirely to external > > > > systems > > > > >> or > > > > >> > ad > > > > >> > > > hoc > > > > >> > > > > user-defined integrations. > > > > >> > > > > > > > > >> > > > > These objects and execution patterns include: > > > > >> > > > > > > > > >> > > > > - Multimodal and unstructured data objects such as > > images, > > > > >> video, > > > > >> > > > audio, > > > > >> > > > > tensors, embeddings, and object references. > > > > >> > > > > - Model inference as part of the data flow, rather than > > an > > > > >> > entirely > > > > >> > > > > external black-box service call. > > > > >> > > > > - Operators backed by heterogeneous resources such as > > GPUs. > > > > >> > > > > - Pythonic and vectorized processing styles. > > > > >> > > > > - Long-running, long-tailed asynchronous computation. > > > > >> > > > > > > > > >> > > > > "AI-Native" is just a shorthand here, meaning that Flink > > > should > > > > >> > natively > > > > >> > > > > understand and support the core abstractions of this class > > of > > > > >> > workloads. > > > > >> > > > > The FLIP needs to make the target workload class clearer. > > What > > > > we > > > > >> > care > > > > >> > > > > about is not any specific model paradigm — LLM, CV, > > > > recommendation, > > > > >> > or > > > > >> > > > > traditional ML inference — but a class of data processing > > > > workloads > > > > >> > with > > > > >> > > > > shared runtime and topology characteristics: > > > > >> > > > > > > > > >> > > > > - A single computation may take seconds or even > minutes, > > > > instead > > > > >> > of > > > > >> > > > > microseconds as in traditional row-at-a-time > processing. > > > > >> > > > > - Execution often involves heterogeneous resources such > > as > > > > CPU + > > > > >> > GPU, > > > > >> > > > > where GPUs are expensive and scarce. > > > > >> > > > > - Data is often multimodal large objects (images, > video, > > > > audio, > > > > >> > > > tensors, > > > > >> > > > > embeddings), rather than structured small records. > > > > >> > > > > - Computation logic often includes model inference or > > > > >> > service-style > > > > >> > > > > invocations as part of the pipeline. > > > > >> > > > > - Many target topologies are relatively shuffle-light > and > > > > don't > > > > >> > > > > necessarily involve complex keyed-state migration, e.g. > > > URI → > > > > >> > > > preprocessing > > > > >> > > > > → inference → sink. > > > > >> > > > > > > > > >> > > > > Ten years ago, many ML workloads took the form of offline > > > > training > > > > >> > plus > > > > >> > > > > online feature serving. Flink already played a strong role > > in > > > > >> feature > > > > >> > > > > engineering, streaming feature computation, and real-time > > data > > > > >> > > > preparation, > > > > >> > > > > so there was no strong need to reshape Flink into an > > > "ML-Native" > > > > >> > engine. > > > > >> > > > > > > > > >> > > > > What is changing today is that model inference itself is > > > > >> increasingly > > > > >> > > > > becoming part of the data processing pipeline; multimodal > > > > objects > > > > >> > are no > > > > >> > > > > longer just opaque blobs in external storage, but data > > objects > > > > that > > > > >> > need > > > > >> > > > to > > > > >> > > > > be referenced, passed, transformed, inferred over, and > > landed > > > > >> inside > > > > >> > the > > > > >> > > > > engine. This is not simply one more ML use case — it is a > > > > change in > > > > >> > the > > > > >> > > > > shape of workloads Flink needs to support. > > > > >> > > > > > > > > >> > > > > On whether the user demand is real, the validation signals > > we > > > > are > > > > >> > > > currently > > > > >> > > > > seeing include: > > > > >> > > > > > > > > >> > > > > - Within Alibaba, multimodal data processing is already > > in > > > > >> > production, > > > > >> > > > > covering image, video, audio, and text modalities. > > > > >> > > > > - In offline conversations with several companies > > > (including > > > > >> > ByteDance > > > > >> > > > > and Tencent), we have heard substantial demand for > Flink > > to > > > > >> > support > > > > >> > > > AI data > > > > >> > > > > processing / multimodal data processing. > > > > >> > > > > - On the ecosystem side, we are working with NVIDIA on > a > > > > joint > > > > >> > demo > > > > >> > > > > focused on multimodal data processing, planned for > Flink > > > > Forward > > > > >> > Asia. > > > > >> > > > > - The emergence and growth of systems such as Daft, Ray > > > Data, > > > > >> > > > > Data-Juicer, and LAS also reflect rapidly growing > demand > > > for > > > > >> > > > multimodal > > > > >> > > > > data processing. > > > > >> > > > > - There have also been independent discussions in this > > > > direction > > > > >> > > > within > > > > >> > > > > the community — for example, the "Streaming-native AI > > > > Inference > > > > >> > > > Runtime > > > > >> > > > > Layer" proposal on the dev list. > > > > >> > > > > > > > > >> > > > > On "why now, instead of waiting for standardization" — I > > > > understand > > > > >> > the > > > > >> > > > > concern. LLM-related frameworks, APIs, and > application-level > > > > >> > patterns are > > > > >> > > > > indeed changing quickly. If this FLIP were trying to bake > a > > > > >> specific > > > > >> > LLM > > > > >> > > > > API, agent framework, or prompt protocol into Flink, the > > risk > > > > would > > > > >> > be > > > > >> > > > high. > > > > >> > > > > > > > > >> > > > > But most of the capabilities in this proposal are not > > > > LLM-specific. > > > > >> > They > > > > >> > > > > are more fundamental data processing and runtime > > capabilities: > > > > >> > Pipeline > > > > >> > > > > Region-level checkpointing, Object Reference, GPU resource > > > > >> > declaration, > > > > >> > > > > columnar data transfer, service-style operator invocation, > > > > >> > long-running > > > > >> > > > > async execution. These are useful for today's LLM > workloads, > > > and > > > > >> > equally > > > > >> > > > > useful for future AI workloads in shapes we cannot fully > > > predict > > > > >> > yet. The > > > > >> > > > > fast-changing parts should live in the ecosystem and SDK > > > layer; > > > > the > > > > >> > FLIP > > > > >> > > > > should focus on more stable engine-level capabilities. > > > > >> > > > > > > > > >> > > > > On tactical changes vs. umbrella, I partly agree with you. > > > Each > > > > >> > sub-FLIP > > > > >> > > > > should be discussed, reviewed, and accepted or rejected on > > its > > > > own > > > > >> > > > merits. > > > > >> > > > > The umbrella should not bypass the normal FLIP process, > and > > > > >> > accepting the > > > > >> > > > > umbrella does not mean accepting all sub-FLIPs. That > said, I > > > > still > > > > >> > think > > > > >> > > > > the umbrella is valuable. Its purpose is not to bind the > 11 > > > > changes > > > > >> > into > > > > >> > > > a > > > > >> > > > > single inseparable package, but to help the community > align > > on > > > > >> > > > principles, > > > > >> > > > > clarify boundaries and dependencies, and avoid conflicting > > or > > > > >> > duplicated > > > > >> > > > > abstractions across related capabilities. > > > > >> > > > > > > > > >> > > > > For example, if RpcOperator is not considered together > with > > > > >> > > > non-disruptive > > > > >> > > > > scaling, it is hard to give GPU operator elasticity > coherent > > > > >> > semantics. > > > > >> > > > > Deploying inference services independently is only the > first > > > > step; > > > > >> > the > > > > >> > > > > harder question is how Flink uniformly handles service > > > > discovery, > > > > >> > > > in-flight > > > > >> > > > > request draining, backpressure, and failover during > scaling. > > > > >> Without > > > > >> > an > > > > >> > > > > umbrella, these capabilities can certainly be advanced as > > > > tactical > > > > >> > > > changes, > > > > >> > > > > but we may end up with a set of abstractions that are > > locally > > > > >> usable > > > > >> > but > > > > >> > > > > globally inconsistent. > > > > >> > > > > > > > > >> > > > > On RpcOperator, I agree that we need to be very careful in > > > > defining > > > > >> > the > > > > >> > > > > boundary between the Flink runtime and external > > orchestration > > > > >> > systems. > > > > >> > > > > Kubernetes or the Kubernetes Operator may well be the > right > > > > choice > > > > >> > at the > > > > >> > > > > physical deployment level. But I still believe Flink > needs a > > > > >> > first-class > > > > >> > > > > RpcOperator abstraction, because deployment is only part > of > > > the > > > > >> > problem — > > > > >> > > > > the harder part is its semantic integration with the Flink > > > job. > > > > >> > > > > > > > > >> > > > > If model inference is part of the logical data flow, Flink > > > > needs at > > > > >> > > > minimum > > > > >> > > > > to be aware of its service discovery, backpressure > behavior, > > > > >> failover > > > > >> > > > > behavior, in-flight request draining, and scaling > > > coordination. > > > > If > > > > >> > it is > > > > >> > > > > hidden entirely behind an external black-box service, it > is > > > hard > > > > >> for > > > > >> > > > Flink > > > > >> > > > > to provide consistent job-level semantics and operational > > > > >> experience. > > > > >> > > > > > > > > >> > > > > So the point of RpcOperator is not necessarily that "every > > > > physical > > > > >> > > > process > > > > >> > > > > must be directly launched and managed by Flink core," but > > that > > > > >> Flink > > > > >> > > > needs > > > > >> > > > > to define a service-style operator contract that allows > such > > > > >> > operators to > > > > >> > > > > be invoked correctly by the data flow, coordinated > correctly > > > by > > > > the > > > > >> > > > > runtime, and understood and operated by users as part of a > > > Flink > > > > >> job. > > > > >> > > > > > > > > >> > > > > On vectorized batch processing, I agree the long-term > > > direction > > > > >> > should > > > > >> > > > not > > > > >> > > > > stop at Python. Native columnar / vectorized execution is > an > > > > >> > end-to-end > > > > >> > > > > problem that touches connectors, formats, the type system, > > > > runtime, > > > > >> > Java, > > > > >> > > > > SQL, and Python. The current proposal starts from the > > > > Java/Python > > > > >> > > > boundary > > > > >> > > > > because that is where the row/column conversion overhead > is > > > most > > > > >> > visible. > > > > >> > > > > End-to-end columnar execution on the Java and SQL side > > > deserves > > > > to > > > > >> be > > > > >> > > > > discussed further as a separate, larger FLIP. > > > > >> > > > > > > > > >> > > > > On multimodal types and SerDes complexity, I agree this > > needs > > > > to be > > > > >> > > > handled > > > > >> > > > > carefully. Making AI-related objects first-class does not > > > imply > > > > >> that > > > > >> > > > every > > > > >> > > > > connector must immediately and fully support image, video, > > > > audio, > > > > >> > tensor, > > > > >> > > > > and so on. The concrete incremental path, fallback > strategy, > > > and > > > > >> the > > > > >> > > > > boundary between formats, connector API, and the type > system > > > > will > > > > >> be > > > > >> > > > > discussed further in the corresponding sub-FLIPs. > > > > >> > > > > > > > > >> > > > > Coming back to the core of the proposal: it is not about > > > turning > > > > >> > Flink > > > > >> > > > into > > > > >> > > > > an AI framework. It is about making the core objects and > > > > execution > > > > >> > > > patterns > > > > >> > > > > of AI-oriented data processing first-class citizens in > > Flink. > > > > >> > > > > > > > > >> > > > > Best, > > > > >> > > > > Guowei > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > On Thu, Apr 30, 2026 at 5:37 AM Yaroslav Tkachenko < > > > > >> > [email protected] > > > > >> > > > > > > > > >> > > > > wrote: > > > > >> > > > > > > > > >> > > > > > Hi Guowei, > > > > >> > > > > > > > > > >> > > > > > Thank you for writing this proposal. > > > > >> > > > > > > > > > >> > > > > > I may be in the minority here, but I hope my voice will > be > > > > >> heard. I > > > > >> > > > > > disagree with turning Flink into an "AI-Native" engine. > > > > >> > > > > > > > > > >> > > > > > Regarding your "Data processing is entering the AI era, > > and > > > > Flink > > > > >> > > > needs to > > > > >> > > > > > evolve from a traditional BI compute engine into a data > > > engine > > > > >> that > > > > >> > > > > > natively supports AI workloads" claim: > > > > >> > > > > > > > > > >> > > > > > - How exactly do you define "AI"? I don't believe there > > is a > > > > >> > standard > > > > >> > > > > > definition. For example, Machine Learning have been > around > > > for > > > > >> more > > > > >> > > > than a > > > > >> > > > > > decade, but there were no proposals (or need, in my > > opinion) > > > > to > > > > >> > turn > > > > >> > > > Flink > > > > >> > > > > > into an "ML-Native" engine. Flink, in its current state, > > has > > > > >> > > > > > been successfully used in many systems alongside > dedicated > > > ML > > > > >> > > > technologies, > > > > >> > > > > > like feature stores. Based on the context of your > > proposal, > > > it > > > > >> > looks > > > > >> > > > like > > > > >> > > > > > you mostly mean LLMs, so could you be specific about the > > > > >> language? > > > > >> > > > > > - I wouldn't call Flink "a traditional BI compute > engine". > > > > Flink > > > > >> > is a > > > > >> > > > > > general data processing technology which can be used > for a > > > > >> variety > > > > >> > of > > > > >> > > > use > > > > >> > > > > > cases without any BI involvement. > > > > >> > > > > > - Do you have any proof that "Users' core workloads are > > > > rapidly > > > > >> > > > evolving" > > > > >> > > > > > and that they require your proposed changes? Case > studies, > > > > user > > > > >> > > > surveys, or > > > > >> > > > > > submitted issues about the lack of support? Big changes > > like > > > > that > > > > >> > > > require > > > > >> > > > > > extensive validation. > > > > >> > > > > > - And even if there is a real need to adopt some > > LLM-driven > > > > >> > changes, > > > > >> > > > why > > > > >> > > > > > now? The LLM-related tooling has been changing so > rapidly, > > > and > > > > >> it's > > > > >> > > > hard to > > > > >> > > > > > predict what will be needed tomorrow. Why does it make > > sense > > > > to > > > > >> > > > introduce > > > > >> > > > > > changes now, and not wait for more standardization and > > > > >> > consolidation? > > > > >> > > > > > > > > > >> > > > > > To summarize, I think there are a lot of great ideas in > > the > > > > >> > proposal, > > > > >> > > > but > > > > >> > > > > > in my mind, they need to be addressed as tactical, > focused > > > > >> > changes, not > > > > >> > > > > > under the "AI-Native" umbrella. > > > > >> > > > > > > > > > >> > > > > > I also wanted to address a few more specific points: > > > > >> > > > > > > > > > >> > > > > > - RpcOperator, why does it need to be managed by Flink? > I > > > see > > > > >> > > > absolutely no > > > > >> > > > > > need to introduce the additional complexity of > > orchestrating > > > > >> > standalone > > > > >> > > > > > components into the core Flink engine. I can imagine a > > > > separate > > > > >> > > > sub-project > > > > >> > > > > > for an RpcOperator, which could potentially be managed > by > > > the > > > > >> > > > Kubernetes > > > > >> > > > > > Operator. > > > > >> > > > > > - You make the case for the vectorized batch processing, > > but > > > > only > > > > >> > on > > > > >> > > > the > > > > >> > > > > > Python side. Why stop there? Native columnar vectorized > > > > execution > > > > >> > will > > > > >> > > > > > require end-to-end changes, including connectors, data > > > format > > > > >> > support, > > > > >> > > > Type > > > > >> > > > > > system support, runtime changes, etc. It seems logical > to > > me > > > > to > > > > >> > support > > > > >> > > > > > this execution mode for Java and SQL as well. > > > > >> > > > > > - Supporting many more data types natively (images, > video, > > > > audio, > > > > >> > > > tensors) > > > > >> > > > > > will make connector serializers and deserializers > (SerDes) > > > > much > > > > >> > more > > > > >> > > > > > challenging to implement. Even today, many SerDes in > > > > officially > > > > >> > > > supported > > > > >> > > > > > connectors don't fully implement types like arrays and > > > > structs. > > > > >> > > > > > > > > > >> > > > > > Thank you. > > > > >> > > > > > > > > > >> > > > > > On Wed, Apr 29, 2026 at 1:18 AM Guowei Ma < > > > > [email protected]> > > > > >> > > > wrote: > > > > >> > > > > > > > > > >> > > > > > > Hi Z > > > > >> > > > > > > > > > > >> > > > > > > Thanks for the kind words and the thoughtful > questions. > > > Let > > > > me > > > > >> > take > > > > >> > > > them > > > > >> > > > > > > one by one. > > > > >> > > > > > > > > > > >> > > > > > > 1. Throughput and latency targets > > > > >> > > > > > > > > > > >> > > > > > > To be honest, I don't have concrete numbers to share > > yet. > > > > What > > > > >> I > > > > >> > can > > > > >> > > > say > > > > >> > > > > > is > > > > >> > > > > > > that our internal testing has already surfaced several > > > > >> directions > > > > >> > > > where > > > > >> > > > > > > Flink can be improved, and at the same time we want to > > > fully > > > > >> > leverage > > > > >> > > > > > > Flink's existing streaming shuffle capabilities. As > the > > > > >> > multimodal > > > > >> > > > > > operator > > > > >> > > > > > > library matures, we'll progressively publish benchmark > > > > results. > > > > >> > > > > > > > > > > >> > > > > > > 2. Built-in operators > > > > >> > > > > > > > > > > >> > > > > > > You're absolutely right. From what I've seen, our > > internal > > > > >> users > > > > >> > > > already > > > > >> > > > > > > rely on a fairly large set of multimodal operators — > > > > >> potentially > > > > >> > > > 100+. > > > > >> > > > > > The > > > > >> > > > > > > exact set the community should provide is best > discussed > > > in > > > > >> > FLIP-XXX: > > > > >> > > > > > > Built-in Multimodal Operators and AI Functions, and > > > > >> contributions > > > > >> > > > from > > > > >> > > > > > the > > > > >> > > > > > > community are very welcome there. > > > > >> > > > > > > > > > > >> > > > > > > 3. Plan for the 11 sub-FLIPs > > > > >> > > > > > > > > > > >> > > > > > > The sequencing follows the layering in the umbrella: > > > > >> > > > > > > > > > > >> > > > > > > - Layer 1 (Core Primitives) should be discussed and > > > > aligned > > > > >> > first, > > > > >> > > > > > since > > > > >> > > > > > > the second and third layers build on it. > > > > >> > > > > > > - Layer 2 (API + compilation + single-node > execution) > > > > starts > > > > >> > with > > > > >> > > > > > > getting the API discussion right — the Python API, > > how > > > > UDFs > > > > >> > > > declare > > > > >> > > > > > > resources, etc. — after which the single-node > > execution > > > > work > > > > >> > can > > > > >> > > > build > > > > >> > > > > > > on > > > > >> > > > > > > top. > > > > >> > > > > > > - Layer 3 (distributed scheduling and > checkpointing) > > > can > > > > >> > largely > > > > >> > > > > > proceed > > > > >> > > > > > > independently in parallel. > > > > >> > > > > > > > > > > >> > > > > > > So while each sub-FLIP is indeed a substantial piece > of > > > > work, > > > > >> > most of > > > > >> > > > > > them > > > > >> > > > > > > can be advanced in parallel by different contributors > > once > > > > the > > > > >> > Layer > > > > >> > > > 1 > > > > >> > > > > > > primitives are settled. > > > > >> > > > > > > > > > > >> > > > > > > 4. GPU scheduling roadmap > > > > >> > > > > > > > > > > >> > > > > > > Could you expand a bit on which aspect of GPU > scheduling > > > you > > > > >> > have in > > > > >> > > > mind > > > > >> > > > > > > as the complex one? "GPU scheduling" covers a fairly > > wide > > > > >> surface > > > > >> > > > area > > > > >> > > > > > > (resource declaration, operator-level deployment, > > elastic > > > > >> > scaling, > > > > >> > > > > > > heterogeneous GPU types, fine-grained partitioning, > > etc.), > > > > and > > > > >> > the > > > > >> > > > answer > > > > >> > > > > > > differs quite a bit depending on which dimension we're > > > > >> > discussing. > > > > >> > > > Once I > > > > >> > > > > > > understand your specific concern I can give a more > > useful > > > > >> > response. > > > > >> > > > > > > > > > > >> > > > > > > Thanks again for the support — looking forward to the > > > > continued > > > > >> > > > > > discussion. > > > > >> > > > > > > > > > > >> > > > > > > Best, > > > > >> > > > > > > Guowei > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > On Tue, Apr 28, 2026 at 4:34 PM zl z < > > > [email protected] > > > > > > > > > >> > wrote: > > > > >> > > > > > > > > > > >> > > > > > > > Hey Guowei, > > > > >> > > > > > > > > > > > >> > > > > > > > Thanks for the proposal, and I think this is very > > > > valuable. I > > > > >> > have > > > > >> > > > some > > > > >> > > > > > > > question about it: > > > > >> > > > > > > > > > > > >> > > > > > > > 1. What are our expected throughput and latency > > targets? > > > > Do > > > > >> we > > > > >> > > > have any > > > > >> > > > > > > > forward-looking tests for this? > > > > >> > > > > > > > > > > > >> > > > > > > > 2. AI involves a very large number of operators. > > Besides > > > > >> > allowing > > > > >> > > > users > > > > >> > > > > > > to > > > > >> > > > > > > > use them through UDFs, will we also provide commonly > > > used > > > > >> > built-in > > > > >> > > > > > > > operators? > > > > >> > > > > > > > > > > > >> > > > > > > > 3. Each of the 11 sub-FLIPs is a major project > > > involving a > > > > >> > > > significant > > > > >> > > > > > > > amount of changes. What is our plan for this? > > > > >> > > > > > > > > > > > >> > > > > > > > 4. GPU scheduling is extremely complex. What is our > > > > current > > > > >> > > > roadmap for > > > > >> > > > > > > > this? > > > > >> > > > > > > > > > > > >> > > > > > > > This is a very high-quality and exciting proposal. > > > Making > > > > >> > Flink an > > > > >> > > > > > > > AI-native data processing engine will make it far > more > > > > >> > valuable in > > > > >> > > > the > > > > >> > > > > > AI > > > > >> > > > > > > > era. Look forward to seeing it land and come to > > fruition > > > > >> soon. > > > > >> > > > > > > > > > > > >> > > > > > > > Robert Metzger <[email protected]> 于2026年4月28日周二 > > > > 14:38写道: > > > > >> > > > > > > > > > > > >> > > > > > > > > Hey Guowei, > > > > >> > > > > > > > > > > > > >> > > > > > > > > Thanks for the proposal. I just took a brief look, > > > here > > > > are > > > > >> > some > > > > >> > > > high > > > > >> > > > > > > > level > > > > >> > > > > > > > > questions: > > > > >> > > > > > > > > > > > > >> > > > > > > > > Regarding the RPC Operator: What is the difference > > to > > > > the > > > > >> > async > > > > >> > > > io > > > > >> > > > > > > > operator > > > > >> > > > > > > > > we have already? > > > > >> > > > > > > > > > > > > >> > > > > > > > > "Connector API for Multimodal Data Source/Sink": > Why > > > do > > > > we > > > > >> > need > > > > >> > > > to > > > > >> > > > > > > touch > > > > >> > > > > > > > > the connector API for supporting multimodal data? > > > Isn't > > > > >> this > > > > >> > > > more of > > > > >> > > > > > a > > > > >> > > > > > > > > formats concern? > > > > >> > > > > > > > > > > > > >> > > > > > > > > "Non-Disruptive Scaling for CPU Operators": How do > > you > > > > want > > > > >> > to > > > > >> > > > > > > guarantee > > > > >> > > > > > > > > exactly-once on that kind of scaling? E.g. you > need > > to > > > > >> > somehow > > > > >> > > > make a > > > > >> > > > > > > > > handover between the old and new new pipeline > > > > >> > > > > > > > > > > > > >> > > > > > > > > Overall, I find the proposal has some things which > > > seem > > > > >> > related > > > > >> > > > to > > > > >> > > > > > > making > > > > >> > > > > > > > > Flink more AI native, but other changes seem > > > orthogonal > > > > to > > > > >> > that. > > > > >> > > > For > > > > >> > > > > > > > > example the checkpoint or scaling changes are > > actually > > > > >> > unrelated > > > > >> > > > to > > > > >> > > > > > AI, > > > > >> > > > > > > > and > > > > >> > > > > > > > > just engine improvements. > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > On Tue, Apr 28, 2026 at 5:48 AM Guowei Ma < > > > > >> > [email protected]> > > > > >> > > > > > > wrote: > > > > >> > > > > > > > > > > > > >> > > > > > > > > > Hi everyone, > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > I'd like to start a discussion on an umbrella > > > FLIP[1] > > > > >> that > > > > >> > lays > > > > >> > > > > > out a > > > > >> > > > > > > > > > direction for evolving Flink into a data engine > > that > > > > >> > natively > > > > >> > > > > > > supports > > > > >> > > > > > > > AI > > > > >> > > > > > > > > > workloads. > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > The short version: user workloads are shifting > > from > > > BI > > > > >> > > > analytics to > > > > >> > > > > > > > > > multimodal data processing centered on model > > > > inference, > > > > >> and > > > > >> > > > this > > > > >> > > > > > > > triggers > > > > >> > > > > > > > > > cascading changes across the stack — multimodal > > data > > > > >> > flowing > > > > >> > > > > > through > > > > >> > > > > > > > > > pipelines, heterogeneous CPU/GPU resources, > > > vectorized > > > > >> > > > execution, > > > > >> > > > > > and > > > > >> > > > > > > > > > inference tasks that run for seconds to minutes > on > > > > Spot > > > > >> > > > instances. > > > > >> > > > > > > The > > > > >> > > > > > > > > > proposal sketches an evolution along five > > directions > > > > >> > > > (development > > > > >> > > > > > > > > paradigm, > > > > >> > > > > > > > > > data model, heterogeneous resources, execution > > > engine, > > > > >> > fault > > > > >> > > > > > > > tolerance), > > > > >> > > > > > > > > > decomposed into 11 sub-FLIPs organized into > three > > > > layers: > > > > >> > core > > > > >> > > > > > > runtime > > > > >> > > > > > > > > > primitives, AI workload expression and > execution, > > > and > > > > >> > > > > > > production-grade > > > > >> > > > > > > > > > operational guarantees. Most sub-FLIPs have no > > hard > > > > >> > > > dependencies on > > > > >> > > > > > > > each > > > > >> > > > > > > > > > other and can be advanced in parallel. > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > A note on scope, since it's an umbrella: > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > - In scope here: whether the evolution > directions > > > are > > > > >> > > > reasonable, > > > > >> > > > > > > > whether > > > > >> > > > > > > > > > each sub-FLIP's motivation and proposed approach > > are > > > > >> > > > well-founded, > > > > >> > > > > > > and > > > > >> > > > > > > > > > whether the boundaries and dependencies between > > > > sub-FLIPs > > > > >> > are > > > > >> > > > > > clear. > > > > >> > > > > > > > > > - Out of scope here: detailed designs, API > > > specifics, > > > > and > > > > >> > > > > > > > implementation > > > > >> > > > > > > > > > plans of individual sub-FLIPs — those will go > > > through > > > > >> > their own > > > > >> > > > > > > FLIPs. > > > > >> > > > > > > > > > - Consensus criteria: agreement on the overall > > > > direction > > > > >> is > > > > >> > > > > > > sufficient > > > > >> > > > > > > > > for > > > > >> > > > > > > > > > the umbrella to pass; passing it does not lock > in > > > any > > > > >> > > > sub-FLIP's > > > > >> > > > > > > > design — > > > > >> > > > > > > > > > sub-FLIPs may still be adjusted, deferred, or > > > > withdrawn > > > > >> as > > > > >> > they > > > > >> > > > > > > > progress. > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > All proposed changes are incremental — no > existing > > > > API or > > > > >> > > > behavior > > > > >> > > > > > is > > > > >> > > > > > > > > > removed or altered. Compatibility details are > > > covered > > > > at > > > > >> > the > > > > >> > > > end of > > > > >> > > > > > > the > > > > >> > > > > > > > > > document. > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > Looking forward to your feedback on the overall > > > > direction > > > > >> > and > > > > >> > > > the > > > > >> > > > > > > > > layering. > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > [1] > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275 > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > Thanks, > > > > >> > > > > > > > > > Guowei > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > > > >
