[ 
https://issues.apache.org/jira/browse/FLINK-39630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079277#comment-18079277
 ] 

featzhang commented on FLINK-39630:
-----------------------------------

I would like to work on this sub-task under the FLINK-39625 umbrella. Could a 
committer please assign it to me (Jira username: featzhang)? Thanks!

> Schedule GPU-affinity operators via ResourceManager
> ---------------------------------------------------
>
>                 Key: FLINK-39630
>                 URL: https://issues.apache.org/jira/browse/FLINK-39630
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: featzhang
>            Priority: Major
>              Labels: gpu, model-inference
>
> h2. Background
> The GPU sidecar is a per-node resource: every {{TaskManager}} hosting a
> sidecar loads the model once and serves all local operators through it.
> For this to work efficiently, operators whose execution depends on the
> sidecar must be scheduled onto slots backed by a node that actually runs
> a live sidecar process.
> This sub-task adds the scheduling hint and resource-matching logic, and
> plugs them into the existing ResourceManager flow. It depends on the
> {{GPUResource}} work already completed in the resource-profile sub-task.
> h2. Scope of this sub-task
> * Mark the GPU client operator from the async-operator sub-task with a
>  {{ResourceSpec}} containing a {{GPUResource}} requirement.
> * Extend the slot matcher so that slots advertised by non-GPU
>  TaskManagers are rejected for such operators.
> * Add a lightweight liveness probe in ResourceManager that verifies the
>  sidecar's {{/health}} endpoint before a slot is handed out; slots with
>  a not-ready sidecar are temporarily withheld.
> * Expose a metric counting the number of rejections due to missing
>  sidecar liveness, to aid diagnosis.
> h2. Out of scope
> * Global GPU placement across multiple clusters.
> * Re-scheduling on model-weight hot reload (the sidecar handles that
>  internally).
> h2. Acceptance criteria
> * Unit tests covering the slot matcher with mixed GPU and non-GPU
>  TaskManagers.
> * Integration test: deploying the async operator on a two-node standalone
>  cluster (one GPU node with mock sidecar, one plain node) schedules all
>  subtasks onto the GPU node.
> * Liveness probe failures are reflected in the new metric and in logs.
> h2. Affected modules
> * {{flink-runtime}}
> * {{flink-runtime-web}} (surface the new metric)
> h2. Links
> Parent: see umbrella issue linked to this sub-task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to