[
https://issues.apache.org/jira/browse/YUNIKORN-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721232#comment-17721232
]
Peter Bacsko edited comment on YUNIKORN-1715 at 5/10/23 7:42 AM:
-----------------------------------------------------------------
[~yichiu] as we discussed on Slack:
# Try to setup Kwok with Yunikorn
# Multiple test scenarios:
** Few apps with lot of pods (10 / 1000)
** Balanced number of apps/pods (100 / 100)
** Lot of apps with few pods (1000 / 10)
Priorities:
# Check heap & cpu profile, which is available on the REST interface
# Network/block/mutex profiles
# Traces
We expose the URL of pprof tool: https://pkg.go.dev/net/http/pprof
was (Author: pbacsko):
[~yichiu] as we discussed on Slack:
# Try to setup Kwok with Yunikorn
# Multiple test scenarios:
** Few apps with lot of pods (10 / 1000)
** Balanced number of apps/pods (100 / 100)
** Lot of apps with few pods (1000 / 10)
Priorities:
1. Check heap & cpu profile, which is available on the REST interface
2. Network/block/mutex profiles
3. Traces
We expose the URL of pprof tool: https://pkg.go.dev/net/http/pprof
> Yunikorn performance improvements
> ---------------------------------
>
> Key: YUNIKORN-1715
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1715
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: core - scheduler
> Reporter: Peter Bacsko
> Assignee: Peter Bacsko
> Priority: Major
>
> There are some methods/functions in Yunikorn which are called frequently and
> often unnecessarily. On a large, busy cluster, eliminating these calls can
> result in faster scheduling cycle, therefore better throughput.
> In the listed cases below, we can re-use a previously computed value and the
> expensive copy/sort phase can be eliminated completely.
> {*}Retrieving node iterators{*}: in
> {{{}baseNodeCollection.getNodeIteratorInternal(){}}}, we always clone the
> tree of sorted nodes, then we build a slice. The node tree is only modified
> when a node gets a new score (plus node add/removal). By reusing the sorted
> list, we avoid cloning an {{*btree.BTree}} structure and creating {{[]*Node}}
> slices.
> {*}Queue sorting{*}: only need sorting if the following occurred:
> * Allocated resource changed in one of the child queues (most common)
> * Pending resource changed from 0 to "n", or from "n" to 0 (affects
> filtering)
> * Child queue got stopped (affects filtering)
> * Child queue structure changed on config update
> {*}Application sorting{*}: in {{Queue.TryAllocate()}} and
> {{{}Queue.TryPlaceholderAllocate(){}}}, {{sortApplications()}} always runs.
> In every iteration, it calls {{Queue.GetCopyOfApps()}} then proceeds to sort
> the apps. It only has to run if something relevant happens from the sorting
> POV:
> * Application added/removed
> * Ask added to an application
> * Ask max priority changed in at least one application
> * Allocated resource changed in at least one application
> {*}Request sorting{*}: request (ask) sorting is only necessary when the
> following occurs:
> * Ask added
> * pendingAskRepeat gets 0 in an ask
> {*}Misc{*}: we can have a bunch of other stuff that helps performance.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]