[
https://issues.apache.org/jira/browse/YUNIKORN-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Bacsko updated YUNIKORN-3120:
-----------------------------------
Target Version: 1.9.0 (was: 1.8.0)
> Enhance Scheduling Latency Metrics with Allocation State Labels
> ---------------------------------------------------------------
>
> Key: YUNIKORN-3120
> URL: https://issues.apache.org/jira/browse/YUNIKORN-3120
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: core - scheduler
> Reporter: Mit Desai
> Assignee: Mit Desai
> Priority: Major
>
> h3. Summary
> Enhance the existing scheduling latency metrics by adding state labels to
> distinguish between scheduling cycles that result in successful pod
> allocation versus cycles that don't find suitable allocations. This
> improvement will significantly enhance debugging capabilities for scheduling
> performance issues.
> h3. Background
> Currently, YuniKorn's {{yunikorn_scheduler_scheduling_latency_milliseconds}}
> metric aggregates all scheduling cycles together, making it difficult to
> distinguish between:
> # {*}Allocation cycles{*}: Cycles where the scheduler successfully finds and
> allocates resources for pending applications
> # {*}Non-allocation cycles{*}: Cycles where the scheduler runs but cannot
> find suitable allocations due to resource constraints, policy restrictions,
> or other factors
> This lack of distinction makes it challenging to debug scheduling latency
> issues, as operators cannot easily identify whether high latency is due to
> complex allocation decisions or repeated failed allocation attempts.
> h3. Implementation Details
> # {*}Metric Enhancement{*}: Add state label to existing histogram metric
> # {*}Cycle Tracking{*}: Track allocation success/failure in scheduling loop
> # {*}Threshold Logging{*}: Configurable threshold for detailed
> non-allocation logging
> # {*}Documentation{*}: Update monitoring guides and dashboard examples
> h3. Backward Compatibility
> * Existing metric queries continue to work unchanged
> * Additive enhancement that doesn't break existing monitoring setups
> * Optional detailed logging that can be configured based on operational needs
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]