[
https://issues.apache.org/jira/browse/YUNIKORN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17907213#comment-17907213
]
Paul Santa Clara commented on YUNIKORN-3003:
--------------------------------------------
Just a bit of extra context anyone interested.
Given 4 siblings queues with the following guarantees:
{color:#1d1c1d}tier 0: 46.66 vcores{color}
{color:#1d1c1d}tier 1: 23.33{color} vcores
{color:#1d1c1d}tier 2: 14 vcore{color}s
{color:#1d1c1d}tier 3: 9.33 vcores
Run a single, identical Spark Job in each. Start with tier-3, then tier-2,
tier-1 and finally tier-0. Wait 60 seconds in-between each run to give the
'higher' tiers a head start. This will force preemption as Yunikorn attempts to
converge on the guarantees.
{color}
{color:#1d1c1d}
See attachment {color}'[^yunikorn-1.6.0-preemption-broken.png]
for results without the patch. Notice how preemption stalls midway through the
test.
[^yunikorn-1.6.0-preemption-fixed.png]
shows that after the patch, the scheduler is able to continue preemption until
it converges on fairness.
{color:#1d1c1d}
{color}
> Previously preempted allocations can be preempted again
> -------------------------------------------------------
>
> Key: YUNIKORN-3003
> URL: https://issues.apache.org/jira/browse/YUNIKORN-3003
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: Paul Santa Clara
> Assignee: Paul Santa Clara
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.7.0, 1.6.1
>
> Attachments: yunikorn-1.6.0-preemption-broken.png,
> yunikorn-1.6.0-preemption-fixed.png
>
>
> This is particularly apparent when preemption is used with longer values of
> terminationGracePeriodSeconds. A task can be selected for preemption and
> during it's gracefulshutdown period, that same task can again be preempted.
> When this occurs, the preemptingResource for the impacted queue will again be
> incremented for the SAME task preventing it from ever reaching zero again
> even after all tasks have fully completed their termination and notified the
> core scheduler. After stopping all workloads, the preemptingResource will
> remain positive unless the yunikorn scheduler pod is restarted
> ```
> {color:#1d1c1d} "preemptingResource": \{ "ephemeral-storage": 21474836480,
> "memory": 4194304000, "pods": 1, "vcore": 1000 },{color}
> ```
> This preemptingResource leak, in turn, may convince the scheduler to avoid
> future preemptions when it attempts to use it to compute the actual used
> resources for a given queue:
> [https://github.com/apache/yunikorn-core/blob/v1.6.0/pkg/scheduler/objects/preemption.go#L826]
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]