[ 
https://issues.apache.org/jira/browse/FLINK-39627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18079274#comment-18079274
 ] 

featzhang commented on FLINK-39627:
-----------------------------------

I would like to work on this sub-task under the FLINK-39625 umbrella. Could a 
committer please assign it to me (Jira username: featzhang)? Thanks!

> Introduce flink-gpu-sidecar module with service skeleton
> --------------------------------------------------------
>
>                 Key: FLINK-39627
>                 URL: https://issues.apache.org/jira/browse/FLINK-39627
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Build System, Runtime / Task
>            Reporter: featzhang
>            Priority: Major
>              Labels: gpu, model-inference
>
> h2. Background
> GPU-accelerated inference benefits significantly from keeping models
> resident in GPU memory and amortising the cost across many requests. The
> umbrella proposal introduces a long-lived sidecar process, co-located with
> each GPU-enabled TaskManager, that owns the model and serves inference
> requests over RPC. This sub-task establishes the module and the minimum
> service skeleton, without yet implementing the actual inference path.
> h2. Scope of this sub-task
> * Add a new Maven module {{flink-gpu-sidecar}} with the standard Flink
>  build conventions (license headers, shade configuration, module
>  descriptor, NOTICE file).
> * Define the configuration surface:
> ** {{sidecar.rpc.endpoint}} - bind address (UDS path or TCP host:port).
> ** {{sidecar.model.uri}} - location of the model to load at startup.
> ** {{sidecar.health.port}} - HTTP port exposing a {{/health}} endpoint.
> * Provide a process entry point that: reads the config, exposes a
>  {{/health}} endpoint returning {{READY}} or {{NOT_READY}}, and blocks on
>  SIGTERM with graceful shutdown.
> * Publish the empty RPC service surface (proto file + generated stubs)
>  containing only a {{Ping}} method. The inference method is added in the
>  next sub-task.
> * Provide a script under {{flink-dist}} to start the sidecar in the
>  TaskManager's lifecycle directory, disabled by default.
> h2. Out of scope
> * No batching, no queueing, no real inference.
> * No integration with any specific model format (that is carried by
>  concrete backends added later).
> * No security / TLS (tracked separately).
> h2. Acceptance criteria
> * {{mvn -pl flink-gpu-sidecar -am verify}} passes.
> * Starting the sidecar with a minimal configuration reaches {{READY}} state
>  within five seconds on a developer laptop.
> * {{Ping}} RPC round-trips end-to-end in an integration test.
> * Clean shutdown on SIGTERM within the configured grace period.
> h2. Affected modules
> * New: {{flink-gpu-sidecar}}
> * {{flink-dist}} (opt-in launch script)
> h2. Links
> Parent: see umbrella issue linked to this sub-task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to