[
https://issues.apache.org/jira/browse/FLINK-39630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079277#comment-18079277
]
featzhang commented on FLINK-39630:
-----------------------------------
I would like to work on this sub-task under the FLINK-39625 umbrella. Could a
committer please assign it to me (Jira username: featzhang)? Thanks!
> Schedule GPU-affinity operators via ResourceManager
> ---------------------------------------------------
>
> Key: FLINK-39630
> URL: https://issues.apache.org/jira/browse/FLINK-39630
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Reporter: featzhang
> Priority: Major
> Labels: gpu, model-inference
>
> h2. Background
> The GPU sidecar is a per-node resource: every {{TaskManager}} hosting a
> sidecar loads the model once and serves all local operators through it.
> For this to work efficiently, operators whose execution depends on the
> sidecar must be scheduled onto slots backed by a node that actually runs
> a live sidecar process.
> This sub-task adds the scheduling hint and resource-matching logic, and
> plugs them into the existing ResourceManager flow. It depends on the
> {{GPUResource}} work already completed in the resource-profile sub-task.
> h2. Scope of this sub-task
> * Mark the GPU client operator from the async-operator sub-task with a
> {{ResourceSpec}} containing a {{GPUResource}} requirement.
> * Extend the slot matcher so that slots advertised by non-GPU
> TaskManagers are rejected for such operators.
> * Add a lightweight liveness probe in ResourceManager that verifies the
> sidecar's {{/health}} endpoint before a slot is handed out; slots with
> a not-ready sidecar are temporarily withheld.
> * Expose a metric counting the number of rejections due to missing
> sidecar liveness, to aid diagnosis.
> h2. Out of scope
> * Global GPU placement across multiple clusters.
> * Re-scheduling on model-weight hot reload (the sidecar handles that
> internally).
> h2. Acceptance criteria
> * Unit tests covering the slot matcher with mixed GPU and non-GPU
> TaskManagers.
> * Integration test: deploying the async operator on a two-node standalone
> cluster (one GPU node with mock sidecar, one plain node) schedules all
> subtasks onto the GPU node.
> * Liveness probe failures are reflected in the new metric and in logs.
> h2. Affected modules
> * {{flink-runtime}}
> * {{flink-runtime-web}} (surface the new metric)
> h2. Links
> Parent: see umbrella issue linked to this sub-task.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)