[ 
https://issues.apache.org/jira/browse/YUNIKORN-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17092781#comment-17092781
 ] 

Wangda Tan commented on YUNIKORN-21:
------------------------------------

Thanks [~Tao Yang], the answer looks reasonable to me, and the number also 
looks very promising.

bq. The scheduling throughput can be improved from 450 to 5000+ in a mock 
cluster with 1000 nodes according to the benchmark results of 
scheduler_perf_test.go in my local test.

Reread the design doc. I think I can understand it better now. From high-level, 
class design makes sense, and if you can have a PoC patch, I can help with 
review and give more detailed suggestions.

After another thought, I think the weighted sort policy may make sense if the 
number of scorers involved is small (no more than 3), I want to avoid 20 
scorers involved and give a weighted score which we cannot explain at all

> Revisit node sorting algorithm for fairness
> -------------------------------------------
>
>                 Key: YUNIKORN-21
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-21
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler
>            Reporter: Wangda Tan
>            Priority: Major
>         Attachments: Improve node sorting algorithm v1.pdf, Improve node 
> sorting algorithm v2.pdf
>
>
> Currently, we're using DominantRatio for the node sorting algorithm
> {code:java}
> func CompUsageShares(left, right *Resource) int {
>  lshares := getShares(left,nil) rshares := getShares(right,nil)
>  return compareShares(lshares, rshares) 
> }{code}
> Which is not good, two reasons:
>  # Dominate resource compare is about 8X more expensive than single float 
> compares for two resource types.
>  # Dominate resource is not stable when we have scarce resource types like 
> GPU. A node with 192GB mem, 32 vcores, and 1 GPU available, compared to 168GB 
> mem, 64 vcore and 8 GPU available; the prior one can go first because of the 
> following logic:
> {code:java}
> if total == nil || total.Resources[k] == 0 {
>  // negative share is logged
>  if v < 0 {
>   log.Logger().Debug("usage is negative no total, share is also negative", 
> zap.Int64("resource quantity", int64(v))) 
>  }
>  shares[idx] = float64(v) idx++ continue
> }{code}
> I think we should discard dominate resource compare for node resource. 
> Instead, we just use one resource type (like vcores) to compare available 
> resource.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to