[ 
https://issues.apache.org/jira/browse/YUNIKORN-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rainie Li updated YUNIKORN-3007:
--------------------------------
    Description: 
*Issue and Investigation:*

We’ve observed spark job slowness issues on our prod cluster, especially when 
large jobs are running on the cluster. This performance degradation impacts 
user experience.

When High cluster utilization with numerous pending pods, could cause  large 
jobs that arrive first to reserve resources on nodes. This reservation 
mechanism prevents new jobs from getting necessary resources, which agains 
preemption.

*Test case:*

Pls refer to attached files. 
 # Submit test-job1 to queue-one
 # Once test-job1 is running, Submit test-job2 to queue-two
 # Once test-job2 is running and pending memory reaches to more than 40TB, 
Submit test-job3 to queue-three

*Proposal:*

YuniKorn incorporates multiple scenarios for making reservations. To address 
the current issue, we propose retaining only the preemption-related 
reservations, as preemption relies on reservations to ensure that resources can 
be reallocated later.

The rationale for removing other reservation scenarios is as follows:
 # If a queue's usage exceeds its guaranteed resources, it should not maintain 
reservations.
 # Conversely, if a queue's usage falls below its guaranteed resources, it 
should be able to secure resources through preemption.

*Our fix:* 

We applied the fix internally to remove allocation case here 
[https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/application.go#L1532]
 

 

Seems reservation 
[https://yunikorn.apache.org/release-announce/0.8.0/#resource-reservation] is 
by design, but in our case it agains preemption

 I would like to open this ticket to have a follow up discussion with the 
community to see what will be the better solution to address this issue.  cc 
[~wilfreds] 

  was:
*Issue and Investigation:* 

We’ve observed spark job slowness issues on our prod cluster, especially when 
large jobs are running on the cluster. This performance degradation impacts 
user experience.

When High cluster utilization with numerous pending pods, could cause  large 
jobs that arrive first to reserve resources on nodes. This reservation 
mechanism prevents new jobs from getting necessary resources, which agains 
preemption.

*Test case:*

Pls refer to attached files. 
 # Submit test-job1 to queue-one
 # Once test-job1 is running, Submit test-job2 to queue-two
 # Once test-job2 is running and pending memory reaches to more than 40TB, 
Submit test-job3 to queue-three

*Proposal:*

YuniKorn incorporates multiple scenarios for making reservations. To address 
the current issue, we propose retaining only the preemption-related 
reservations, as preemption relies on reservations to ensure that resources can 
be reallocated later.

The rationale for removing other reservation scenarios is as follows:
 # If a queue's usage exceeds its guaranteed resources, it should not maintain 
reservations.
 # Conversely, if a queue's usage falls below its guaranteed resources, it 
should be able to secure resources through preemption.

*Our fix:* 

We applied the fix internally to remove allocation case here 
[https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/application.go#L1532]
 

 

Seems reservation 
[https://yunikorn.apache.org/release-announce/0.8.0/#resource-reservation] is 
by design, but in our case it agains preemption 

 I would like to open this ticket to have a follow up discussion with the 
community to see what will be better solution to address this issue.  cc 
[~wilfreds] 


> Improve YuniKorn reservation logic
> ----------------------------------
>
>                 Key: YUNIKORN-3007
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3007
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler
>            Reporter: Rainie Li
>            Assignee: Rainie Li
>            Priority: Major
>         Attachments: queue.yaml, test-job1.yaml, test-job2.yaml, 
> test-job3.yaml
>
>
> *Issue and Investigation:*
> We’ve observed spark job slowness issues on our prod cluster, especially when 
> large jobs are running on the cluster. This performance degradation impacts 
> user experience.
> When High cluster utilization with numerous pending pods, could cause  large 
> jobs that arrive first to reserve resources on nodes. This reservation 
> mechanism prevents new jobs from getting necessary resources, which agains 
> preemption.
> *Test case:*
> Pls refer to attached files. 
>  # Submit test-job1 to queue-one
>  # Once test-job1 is running, Submit test-job2 to queue-two
>  # Once test-job2 is running and pending memory reaches to more than 40TB, 
> Submit test-job3 to queue-three
> *Proposal:*
> YuniKorn incorporates multiple scenarios for making reservations. To address 
> the current issue, we propose retaining only the preemption-related 
> reservations, as preemption relies on reservations to ensure that resources 
> can be reallocated later.
> The rationale for removing other reservation scenarios is as follows:
>  # If a queue's usage exceeds its guaranteed resources, it should not 
> maintain reservations.
>  # Conversely, if a queue's usage falls below its guaranteed resources, it 
> should be able to secure resources through preemption.
> *Our fix:* 
> We applied the fix internally to remove allocation case here 
> [https://github.com/apache/yunikorn-core/blob/master/pkg/scheduler/objects/application.go#L1532]
>  
>  
> Seems reservation 
> [https://yunikorn.apache.org/release-announce/0.8.0/#resource-reservation] is 
> by design, but in our case it agains preemption
>  I would like to open this ticket to have a follow up discussion with the 
> community to see what will be the better solution to address this issue.  cc 
> [~wilfreds] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to