[ 
https://issues.apache.org/jira/browse/FLINK-38825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dian Fu reassigned FLINK-38825:
-------------------------------

    Assignee: featzhang

> Introduce an AI-friendly Async Batch Operator for high-latency inference 
> workloads
> ----------------------------------------------------------------------------------
>
>                 Key: FLINK-38825
>                 URL: https://issues.apache.org/jira/browse/FLINK-38825
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Task
>            Reporter: featzhang
>            Assignee: featzhang
>            Priority: Major
>              Labels: pull-request-available
>
> h4. Background
> Apache Flink currently provides {{AsyncFunction}} and {{AsyncWaitOperator}} 
> for record-level asynchronous I/O.
> While this model works well for traditional lookup-style workloads, it does 
> not align well with {*}modern AI / ML inference and high-latency external 
> services{*}, which typically prefer *batch-based execution* and require 
> tighter control over latency, batching, and concurrency.
> Typical examples include:
>  * GPU-based model inference where batching significantly improves throughput
>  * External inference or embedding services exposing batch APIs
>  * RPC / database systems with high per-request overhead
> To address this gap, this issue introduces a *batch-oriented async processing 
> foundation* in the DataStream API.
> ----
> h4. What has been implemented
> This issue has been implemented incrementally via {*}7 focused pull 
> requests{*}, providing a complete and reviewable initial solution:
>  # *New public API*
>  ** Introduced {{AsyncBatchFunction<IN, OUT>}} ({{{}@PublicEvolving{}}})
>  ** Enables users to perform async I/O over a _batch_ of input records
>  # *New runtime operator*
>  ** Added {{AsyncBatchWaitOperator}} with *unordered semantics*
>  ** Supports *size-based batch triggering*
>  ** Supports *time-based batch triggering*
>  ** Flushes remaining records on end-of-input
>  ** Preserves existing async failure semantics
>  # *Stream API entry point*
>  ** Added {{AsyncDataStream.unorderedWaitBatch(...)}}
>  ** Fully additive and consistent with existing async APIs
>  # *Robust test coverage*
>  ** Batch size triggering
>  ** Batch time triggering
>  ** Correct result emission
>  ** Exception propagation and failure handling
>  # *Incremental and review-friendly design*
>  ** Implementation intentionally split into multiple PRs
>  ** Each PR focuses on a single concern (API, operator, time trigger, tests, 
> etc.)
>  ** No changes to existing async APIs or behavior
> ----
> h4. Current scope and guarantees
>  * Fully backward-compatible
>  * No changes to {{AsyncFunction}} or {{AsyncWaitOperator}}
>  * Opt-in, additive API only
>  * Designed as a minimal but extensible foundation
> This implementation already enables *practical batch-based inference 
> pipelines* in Flink with significantly reduced boilerplate compared to 
> record-level async I/O.
> ----
> h4. What is intentionally NOT included (follow-up work)
> The following items are *explicitly out of scope for the initial 
> implementation* and can be addressed incrementally in follow-up issues or PRs:
>  * Ordered batch semantics
>  * Event-time–based batching
>  * Retry / timeout / fallback strategies
>  * Batch-level concurrency controls
>  * Inference-specific metrics and observability
>  * SQL / Table API / Python API integration
> ----
> h4. Summary
> This issue is no longer a pure proposal:
> it now provides a *working, tested, and extensible async batch processing 
> primitive* in Flink, suitable for AI inference and other high-latency 
> batch-oriented workloads, while keeping the core async API stable and 
> backward-compatible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to