Mit Desai created YUNIKORN-3084:
-----------------------------------
Summary: Fix Inconsistency in Allocation Removal from
sortedRequests
Key: YUNIKORN-3084
URL: https://issues.apache.org/jira/browse/YUNIKORN-3084
Project: Apache YuniKorn
Issue Type: Bug
Components: core - scheduler
Affects Versions: 1.6.3, 1.6.2, 1.6.1, 1.6.0
Reporter: Mit Desai
Assignee: Mit Desai
There is an issue in the _remove_ method of _sortedRequests_ in the scheduler
that causes allocations to be inconsistently removed. This can lead to a
situation where an allocation is removed from the application's allocations map
but remains in the sortedRequests list, consuming scheduler cycles for
non-existant pods. This also leads to a state where not all allocations get
removed and Application stays in the UI in a _New_ state even though it has
already been removed from the cluster.
The current implementation of the _remove_ method in _sortedRequests_ uses a
binary search approach with the _LessThan_ comparison function to find
allocations to remove. However, this approach is flawed because:
1. The binary search is looking for an allocation based on priority and
creation time, not by its unique allocation key
2. When multiple allocations have the same priority and creation time, the
binary search may find the wrong allocation
3. This causes inconsistent behavior where allocations are not properly removed
from the _sortedRequests_ list
The binary search in the _remove_ method is using the _LessThan_ comparison
function, which compares allocations based on priority and creation time, not
by allocation key. When multiple allocations have the same priority and
creation time, the binary search may find a different allocation than the one
we want to remove.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]