[ 
https://issues.apache.org/jira/browse/YUNIKORN-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17092340#comment-17092340
 ] 

Wangda Tan commented on YUNIKORN-21:
------------------------------------

Several examples for the node sorting policy: 

1) Bin-packing policy: Get sorted node list based on most-used nodes. 

2) Peanut-buttering policy: Get sorted node list based on least-used nodes. 

3) Best-fit policy: Get a sorted node list based on the node's available 
resource vector most similar to requested resource. 
(For example, requested resource is cpu=2,mem=3; A node with available resource 
cpu=4,mem=6 is more "fit" comparing to another node with available  resource 
cpu=6,mem=4. (Reference to the paper: 
https://www.cs.cmu.edu/~xia/resources/Documents/grandl_sigcomm14.pdf) 

1/2 are not request-related, 3 is request-related, I'm wondering how we deal 
with these different use cases based on the proposal.

Also, it will be important to make surethe node sorting policy can be used by 
preemption logic. 

> Revisit node sorting algorithm for fairness
> -------------------------------------------
>
>                 Key: YUNIKORN-21
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-21
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler
>            Reporter: Wangda Tan
>            Priority: Major
>         Attachments: Improve node sorting algorithm v1.pdf, Improve node 
> sorting algorithm v2.pdf
>
>
> Currently, we're using DominantRatio for the node sorting algorithm
> {code:java}
> func CompUsageShares(left, right *Resource) int {
>  lshares := getShares(left,nil) rshares := getShares(right,nil)
>  return compareShares(lshares, rshares) 
> }{code}
> Which is not good, two reasons:
>  # Dominate resource compare is about 8X more expensive than single float 
> compares for two resource types.
>  # Dominate resource is not stable when we have scarce resource types like 
> GPU. A node with 192GB mem, 32 vcores, and 1 GPU available, compared to 168GB 
> mem, 64 vcore and 8 GPU available; the prior one can go first because of the 
> following logic:
> {code:java}
> if total == nil || total.Resources[k] == 0 {
>  // negative share is logged
>  if v < 0 {
>   log.Logger().Debug("usage is negative no total, share is also negative", 
> zap.Int64("resource quantity", int64(v))) 
>  }
>  shares[idx] = float64(v) idx++ continue
> }{code}
> I think we should discard dominate resource compare for node resource. 
> Instead, we just use one resource type (like vcores) to compare available 
> resource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to