[ 
https://issues.apache.org/jira/browse/YUNIKORN-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17923151#comment-17923151
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-3007:
-------------------------------------------------

I completely agree, we should not remove reservations.

Simple example: I run all nodes with 128 GB of memory available for pods. 
Daemonsets take about 2 GB on each node, leaving 126 GB for end user pods. I 
have a large number of users all submit jobs with "small" pods. The load 
submitted by these users is large enough that it can keep the cluster 
continually (24/7) fully utilised. 

A user submits a job that requests pods that use 64 GB. Just one of those pods 
can run on a node. In the case that I do not have reservations the chance that 
this job will ever get to run is negligible. Every time a small gap opens up on 
a node the smaller requests will fill it up and never get the large request to 
run. This is an extreem case but the same can happen if you have a loaded 
cluster with some relatively large requests  32 GB vs 4 GB or 4 CPU vs 1 CPU.

I agree that we could look at some enhancements but removing or even turning it 
off can have huge impacts and cause job starvation. That could then lead to 
increased preemption and a far less predictable scheduling cylce.

> Improve YuniKorn reservation logic
> ----------------------------------
>
>                 Key: YUNIKORN-3007
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3007
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler
>            Reporter: Rainie Li
>            Assignee: Rainie Li
>            Priority: Major
>         Attachments: queue.yaml, test-job1.yaml, test-job2.yaml, 
> test-job3.yaml
>
>
> *Issue and Investigation:*
> We’ve observed spark job slowness issues on our prod cluster, especially when 
> large jobs are running on the cluster. This performance degradation impacts 
> user experience.
> When High cluster utilization with numerous pending pods, could cause  large 
> jobs that arrive first to reserve resources on nodes. This reservation 
> mechanism prevents new jobs from getting necessary resources, which agains 
> preemption.
> *Test case:*
> Pls refer to attached files. 
>  # Submit test-job1 to queue-one
>  # Once test-job1 is running, Submit test-job2 to queue-two
>  # Once test-job2 is running and pending memory reaches to more than 40TB, 
> Submit test-job3 to queue-three
> *Proposal:*
> YuniKorn incorporates multiple scenarios for making reservations. To address 
> the current issue, we propose retaining only the preemption-related 
> reservations, as preemption relies on reservations to ensure that resources 
> can be reallocated later.
> The rationale for removing other reservation scenarios is as follows:
>  # If a queue's usage exceeds its guaranteed resources, it should not 
> maintain reservations.
>  # Conversely, if a queue's usage falls below its guaranteed resources, it 
> should be able to secure resources through preemption.
> *Our fix:* 
> We applied the fix internally to remove allocation case here 
> [https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/application.go#L1532]
>  
>  
> Seems reservation 
> [https://yunikorn.apache.org/release-announce/0.8.0/#resource-reservation] is 
> by design, but in our case it agains preemption
>  I would like to open this ticket to have a follow up discussion with the 
> community to see what will be the better solution to address this issue.  cc 
> [~wilfreds] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to