[
https://issues.apache.org/jira/browse/FLINK-39629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079276#comment-18079276
]
featzhang commented on FLINK-39629:
-----------------------------------
I would like to work on this sub-task under the FLINK-39625 umbrella. Could a
committer please assign it to me (Jira username: featzhang)? Thanks!
> Add Async I/O operator that invokes GPU sidecar from DataStream API
> -------------------------------------------------------------------
>
> Key: FLINK-39629
> URL: https://issues.apache.org/jira/browse/FLINK-39629
> Project: Flink
> Issue Type: Sub-task
> Components: API / DataStream
> Reporter: featzhang
> Priority: Major
> Labels: gpu, model-inference
>
> h2. Background
> With a functional sidecar that batches inference requests, user programs
> written against the DataStream API need a convenient, correctly ordered,
> back-pressure-friendly way to invoke the sidecar. Flink's existing Async
> I/O subsystem ({{AsyncDataStream}}, {{AsyncFunction}},
> {{RichAsyncFunction}}) already provides the required machinery for
> ordered / unordered result emission, timeouts, and capacity limits.
> This sub-task adds a first-party async function that delegates record-by-
> record inference to the sidecar.
> h2. Scope of this sub-task
> * Add {{GpuSidecarAsyncFunction<IN, OUT>}} in a new package under
> {{flink-streaming-java}} or a dedicated {{flink-gpu-client}} module.
> Constructor parameters:
> ** Sidecar endpoint (UDS or TCP).
> ** Request timeout.
> ** Maximum inflight requests (capacity).
> ** A pluggable {{TensorCodec<IN, OUT>}} that converts records to / from
> the sidecar's tensor wire format.
> * Manage one RPC channel per parallel subtask; reconnect with exponential
> backoff.
> * Honour Flink checkpointing: inflight requests are drained on checkpoint
> barriers when ordered emission is requested.
> * Surface the sidecar's structured error codes as
> {{AsyncFunction}}-level failures so the user program can apply Flink's
> standard restart strategies.
> h2. Out of scope
> * Table / SQL integration (tracked in a later sub-task).
> * Scheduling operators onto nodes with a live sidecar (tracked in a later
> sub-task).
> * Model-hot-reload UX on the DataStream side.
> h2. Acceptance criteria
> * End-to-end test: a job with a source, a
> {{GpuSidecarAsyncFunction}}, and a sink runs against the mock sidecar
> on CI and produces deterministic output under ordered emission.
> * Timeout and capacity settings behave as documented.
> * Clean shutdown on job cancellation; no connection leak.
> h2. Affected modules
> * {{flink-streaming-java}} (or new {{flink-gpu-client}})
> * {{flink-tests}} (integration test against mock sidecar)
> h2. Links
> Parent: see umbrella issue linked to this sub-task.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)