[ https://issues.apache.org/jira/browse/YUNIKORN-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17092781#comment-17092781 ]
Wangda Tan commented on YUNIKORN-21: ------------------------------------ Thanks [~Tao Yang], the answer looks reasonable to me, and the number also looks very promising. bq. The scheduling throughput can be improved from 450 to 5000+ in a mock cluster with 1000 nodes according to the benchmark results of scheduler_perf_test.go in my local test. Reread the design doc. I think I can understand it better now. From high-level, class design makes sense, and if you can have a PoC patch, I can help with review and give more detailed suggestions. After another thought, I think the weighted sort policy may make sense if the number of scorers involved is small (no more than 3), I want to avoid 20 scorers involved and give a weighted score which we cannot explain at all > Revisit node sorting algorithm for fairness > ------------------------------------------- > > Key: YUNIKORN-21 > URL: https://issues.apache.org/jira/browse/YUNIKORN-21 > Project: Apache YuniKorn > Issue Type: Improvement > Components: core - scheduler > Reporter: Wangda Tan > Priority: Major > Attachments: Improve node sorting algorithm v1.pdf, Improve node > sorting algorithm v2.pdf > > > Currently, we're using DominantRatio for the node sorting algorithm > {code:java} > func CompUsageShares(left, right *Resource) int { > lshares := getShares(left,nil) rshares := getShares(right,nil) > return compareShares(lshares, rshares) > }{code} > Which is not good, two reasons: > # Dominate resource compare is about 8X more expensive than single float > compares for two resource types. > # Dominate resource is not stable when we have scarce resource types like > GPU. A node with 192GB mem, 32 vcores, and 1 GPU available, compared to 168GB > mem, 64 vcore and 8 GPU available; the prior one can go first because of the > following logic: > {code:java} > if total == nil || total.Resources[k] == 0 { > // negative share is logged > if v < 0 { > log.Logger().Debug("usage is negative no total, share is also negative", > zap.Int64("resource quantity", int64(v))) > } > shares[idx] = float64(v) idx++ continue > }{code} > I think we should discard dominate resource compare for node resource. > Instead, we just use one resource type (like vcores) to compare available > resource. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org