Hi everyone,


I'd like to start a discussion on FLIP-592: First-Class Accelerator
Resource Support [1].


As noted in FLIP-577 (AI-Native Flink) [2], with the growth of
AI-oriented workloads, accelerators (GPUs, NPUs, TPUs) have become
essential resources for Flink jobs. The existing ExternalResource
framework (FLIP-108) [3] provides a generic abstraction, but lacks
dedicated accelerator APIs and resource allocation strategies optimized
for accelerator utilization.


This FLIP proposes elevating accelerators to first-class resources with
end-to-end native support. The proposal focuses on:


- Dedicated accelerator resource declaration APIs and configurations,
  with K8s/YARN deployment integration
- A new resource allocation strategy that supports heterogeneous TM
  provisioning, isolating CPU-only and accelerator-equipped TMs to
  improve accelerator utilization
- An SPI-based framework for device discovery and metrics collection,
  with built-in support for Nvidia GPUs


All new capabilities are optional and fully backward compatible.


Looking forward to your feedback!


Best regards,
Yi Zhang


[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-592%3A+First-Class+Accelerator+Resource+Support
[2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275
[3] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink

Reply via email to