[ 
https://issues.apache.org/jira/browse/YUNIKORN-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17907213#comment-17907213
 ] 

Paul Santa Clara commented on YUNIKORN-3003:
--------------------------------------------

Just a bit of extra context anyone interested. 

Given 4 siblings queues with the following guarantees:

{color:#1d1c1d}tier 0: 46.66 vcores{color}
{color:#1d1c1d}tier 1: 23.33{color} vcores
{color:#1d1c1d}tier 2: 14 vcore{color}s
{color:#1d1c1d}tier 3: 9.33 vcores

Run a single, identical Spark Job in each. Start with tier-3, then tier-2, 
tier-1 and finally tier-0. Wait 60 seconds in-between each run to give the 
'higher' tiers a head start. This will force preemption as Yunikorn attempts to 
converge on the guarantees.
{color}

{color:#1d1c1d}
See attachment {color}'[^yunikorn-1.6.0-preemption-broken.png]

for results without the patch.  Notice how preemption stalls midway through the 
test.



[^yunikorn-1.6.0-preemption-fixed.png]
shows that after the patch, the scheduler is able to continue preemption until 
it converges on fairness.

 


{color:#1d1c1d}


{color}

> Previously preempted allocations can be preempted again
> -------------------------------------------------------
>
>                 Key: YUNIKORN-3003
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3003
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Paul Santa Clara
>            Assignee: Paul Santa Clara
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0, 1.6.1
>
>         Attachments: yunikorn-1.6.0-preemption-broken.png, 
> yunikorn-1.6.0-preemption-fixed.png
>
>
> This is particularly apparent when preemption is used with longer values of 
> terminationGracePeriodSeconds.  A task can be selected for preemption and 
> during it's gracefulshutdown period, that same task can again be preempted.
> When this occurs, the preemptingResource for the impacted queue will again be 
> incremented for the SAME task preventing it from ever reaching zero again 
> even after all tasks have fully completed their termination and notified the 
> core scheduler.  After stopping all workloads, the preemptingResource will 
> remain positive unless the yunikorn scheduler pod is restarted
> ```
> {color:#1d1c1d} "preemptingResource": \{ "ephemeral-storage": 21474836480, 
> "memory": 4194304000, "pods": 1, "vcore": 1000 },{color}
> ``` 
> This preemptingResource leak, in turn, may convince the scheduler to avoid 
> future preemptions when it attempts to use it to compute the actual used 
> resources for a given queue: 
> [https://github.com/apache/yunikorn-core/blob/v1.6.0/pkg/scheduler/objects/preemption.go#L826]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to