[
https://issues.apache.org/jira/browse/YUNIKORN-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802498#comment-17802498
]
Yu-Lin Chen commented on YUNIKORN-2294:
---------------------------------------
Confirmed that the issue exists when working on YUNIKORN-2305:
[https://github.com/chenyulin0719/yunikorn-k8shim/actions/runs/7406149592]
(Could download and check the artifact log zip file)
2024-01-04T06:21:17.432Z INFO core.scheduler.fsm
objects/application_state.go:147 Application state transition \{"appID":
"appid-a2xe2", "source": "New", "destination": "Accepted", "event":
"runApplication"}
{color:#de350b}*2024-01-04T06:21:37.435Z*{color} INFO core.scheduler.fsm
objects/application_state.go:147 Application state transition \{"appID":
"appid-a2xe2", "source": "Accepted", "destination": "Failing", "event":
"failApplication"}
{color:#de350b}*2024-01-04T06:21:37.701Z*{color} INFO core.scheduler.fsm
objects/application_state.go:147 Application state transition \{"appID":
"appid-a2xe2", "source": "Failing", "destination": "Failed", "event":
"failApplication"}
-> It only took 266 ms from Failing to Failed
> Flaky E2E Test: "Verify_Hard_GS_Failed_State" polling short-lived "Failing"
> application status
> ----------------------------------------------------------------------------------------------
>
> Key: YUNIKORN-2294
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2294
> Project: Apache YuniKorn
> Issue Type: Sub-task
> Components: test - e2e
> Reporter: Yu-Lin Chen
> Assignee: Yu-Lin Chen
> Priority: Major
>
> We got below E2E test fails In gang_scheduling e2e test
> “Verify_Hard_GS_Failed_State”.
> #
> [https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:972
>
> |https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:972](PR
> of YUNIKORN-2292)
> #
> [https://github.com/apache/yunikorn-k8shim/actions/runs/7308989229/job/19960722817?pr=753#step:6:971
>
> |https://github.com/apache/yunikorn-k8shim/actions/runs/7308989229/job/19960722817?pr=753#step:6:971](PR
> of YUNIKORN-2247)
> The e2e test waits until application status turn into ‘Failing’.
> ([gang_scheduling_test.go#L288|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/gang_scheduling/gang_scheduling_test.go#L288])
> However, the application won't stay in "Failing" too long. Below are my
> local test results.
> # 0.565 seconds
> # 0.519 seconds
> # 0.634 seconds
> # 0.604 seconds
> # 0.573 seconds
> # 0.586 seconds
> # 0.587 seconds
> # 0.640 seconds
> # 0.779 seconds
> # 0.584 seconds
> (PS: Compare the time between 2 failApplication events, "Accept->Failing",
> "Failing -> Failed")
> The polling frequency of checkAppStatus() is 300ms, so {color:#de350b}this
> issue still can't be reproduced in my local environment.{color} However, we
> still have no guarantee that the application will stay in 'Failing' longer
> than 300 ms.
> (The dumped scheduler log of the e2e test is missing due to the issue
> mentioned in YUNIKORN-2293. The e2e test didn't call
> tests.LogYunikornContainer() in AfterEach. After YUNIKORN-2293 fixed, we will
> be able to check the failed log in Github action.)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]