[
https://issues.apache.org/jira/browse/YUNIKORN-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Bacsko resolved YUNIKORN-2294.
------------------------------------
Fix Version/s: 1.5.0
Resolution: Fixed
> Flaky E2E Test: "Verify_Hard_GS_Failed_State" polling short-lived "Failing"
> application status
> ----------------------------------------------------------------------------------------------
>
> Key: YUNIKORN-2294
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2294
> Project: Apache YuniKorn
> Issue Type: Sub-task
> Components: test - e2e
> Reporter: Yu-Lin Chen
> Assignee: Yu-Lin Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.5.0
>
>
> We got below E2E test fails In gang_scheduling e2e test
> “Verify_Hard_GS_Failed_State”.
> #
> [https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:972
>
> |https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:972](PR
> of YUNIKORN-2292)
> #
> [https://github.com/apache/yunikorn-k8shim/actions/runs/7308989229/job/19960722817?pr=753#step:6:971
>
> |https://github.com/apache/yunikorn-k8shim/actions/runs/7308989229/job/19960722817?pr=753#step:6:971](PR
> of YUNIKORN-2247)
> The e2e test waits until application status turn into ‘Failing’.
> ([gang_scheduling_test.go#L288|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/gang_scheduling/gang_scheduling_test.go#L288])
> However, the application won't stay in "Failing" too long. Below are my
> local test results.
> # 0.565 seconds
> # 0.519 seconds
> # 0.634 seconds
> # 0.604 seconds
> # 0.573 seconds
> # 0.586 seconds
> # 0.587 seconds
> # 0.640 seconds
> # 0.779 seconds
> # 0.584 seconds
> (PS: Compare the time between 2 failApplication events, "Accept->Failing",
> "Failing -> Failed")
> The polling frequency of checkAppStatus() is 300ms, so {color:#de350b}this
> issue still can't be reproduced in my local environment.{color} However, we
> still have no guarantee that the application will stay in 'Failing' longer
> than 300 ms.
> (The dumped scheduler log of the e2e test is missing due to the issue
> mentioned in YUNIKORN-2293. The e2e test didn't call
> tests.LogYunikornContainer() in AfterEach. After YUNIKORN-2293 fixed, we will
> be able to check the failed log in Github action.)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]