[
https://issues.apache.org/jira/browse/YUNIKORN-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808505#comment-17808505
]
Yu-Lin Chen edited comment on YUNIKORN-2329 at 1/19/24 7:16 AM:
----------------------------------------------------------------
Attached the benchmark result, I tested 2 scenarios and compared the
performance before and after removing the gorounte in RMProxy.UpdateNode():
# Loop RMProxy.UpdateNode 1000/100000 times ([Test
Function|https://github.com/apache/yunikorn-k8shim/compare/master...chenyulin0719:yunikorn-k8shim:YUNIKORN-2329-TEST-Update-Node#diff-21bb09b8e40f7dad5e284263bebc312a2014aa10fb9ee100703184c56a28ac4eR69-R73])
# Restart YuniKorn with 5000/10000/20000 nodes created by Kwok (2 foreign pods
per node), and check the time spent on "[Step 3: Register
pods.|https://github.com/apache/yunikorn-k8shim/compare/master...chenyulin0719:yunikorn-k8shim:YUNIKORN-2329-Benchmark-Restart#diff-f4ff08b78a3be168145978950329d93472558f0cc23f0690d534a48418ff7e2cR1403-R1411]"
In the test case 1 result, the avg call of RMProxy.UpdateNode is less than 1
us. Using goroutine didn't really speed up the function call.
In the test case 2 result, remove goroutine from RMProxy.UpdateNode also didn't
cause an obvious regression. (The average execution time difference is only
0.5% to -5%, and I believe this falls within the margin of error.
[~chia7712], [~ccondit-target] In this case, I think we can proceed with the
workaround solution. (Remove goroutine in RMProxy.UpdateNode )
was (Author: yu-lin chen):
Attached the benchmark result, I tested 2 situations and compared the
performance before and after removing the gorounte in RMProxy.UpdateNode():
# Loop RMProxy.UpdateNode() 1000/100000 times ([Test
Function|https://github.com/apache/yunikorn-k8shim/compare/master...chenyulin0719:yunikorn-k8shim:YUNIKORN-2329-TEST-Update-Node#diff-21bb09b8e40f7dad5e284263bebc312a2014aa10fb9ee100703184c56a28ac4eR69-R73])
# Restart YuniKorn with 5000/10000/20000 nodes created by Kwok (2 foreign pods
per node), and check the time spent on "[Step 3: Register
pods.|https://github.com/apache/yunikorn-k8shim/compare/master...chenyulin0719:yunikorn-k8shim:YUNIKORN-2329-Benchmark-Restart#diff-f4ff08b78a3be168145978950329d93472558f0cc23f0690d534a48418ff7e2cR1403-R1411]"
In the test case 1 result, the avg call of RMProxy.UpdateNode is less than 1
us. Using goroutine didn't really speed up the function call.
In the test case 2 result, remove goroutine from RMProxy.UpdateNode also didn't
cause an obvious regression. (The average execution time difference is only
0.5% to -5%, and I believe this falls within the margin of error.
[~chia7712], [~ccondit-target] In this case, I think we can proceed with the
workaround solution. (Remove goroutine in RMProxy.UpdateNode )
> Remove goroutine in RMProxy.UpdateNode after benchmark
> ------------------------------------------------------
>
> Key: YUNIKORN-2329
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2329
> Project: Apache YuniKorn
> Issue Type: Sub-task
> Components: core - common
> Reporter: Yu-Lin Chen
> Assignee: Yu-Lin Chen
> Priority: Major
> Attachments: [YUNIKORN-2329] Benchmark Result.png
>
>
> As a temporarily solution before finalizing the solution of the racing issue
> in YUNIKORN-2327, we can temporarily remove the goroutine in
> [RMProxy.UpdateNode()|https://github.com/apache/yunikorn-core/blob/master/pkg/rmproxy/rmproxy.go#L378-L389]
> to make sure the function put si.NodeRequest in channel in the same order
> from shim. Perforamcne benchmarking must be done before applying this change.
> We should revert this change when applying the final solution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]