[ 
https://issues.apache.org/jira/browse/YUNIKORN-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802498#comment-17802498
 ] 

Yu-Lin Chen commented on YUNIKORN-2294:
---------------------------------------

Confirmed that the issue exists when working on YUNIKORN-2305:
[https://github.com/chenyulin0719/yunikorn-k8shim/actions/runs/7406149592] 
(Could download and check the artifact log zip file)



2024-01-04T06:21:17.432Z INFO core.scheduler.fsm 
objects/application_state.go:147 Application state transition \{"appID": 
"appid-a2xe2", "source": "New", "destination": "Accepted", "event": 
"runApplication"}

{color:#de350b}*2024-01-04T06:21:37.435Z*{color} INFO core.scheduler.fsm 
objects/application_state.go:147 Application state transition \{"appID": 
"appid-a2xe2", "source": "Accepted", "destination": "Failing", "event": 
"failApplication"}

{color:#de350b}*2024-01-04T06:21:37.701Z*{color} INFO core.scheduler.fsm 
objects/application_state.go:147 Application state transition \{"appID": 
"appid-a2xe2", "source": "Failing", "destination": "Failed", "event": 
"failApplication"}

-> It only took 266 ms from Failing to Failed

> Flaky E2E Test: "Verify_Hard_GS_Failed_State" polling short-lived "Failing" 
> application status
> ----------------------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-2294
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2294
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: test - e2e
>            Reporter: Yu-Lin Chen
>            Assignee: Yu-Lin Chen
>            Priority: Major
>
> We got below E2E test fails In gang_scheduling e2e test 
> “Verify_Hard_GS_Failed_State”.
>  # 
> [https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:972
>  
> |https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:972](PR
>  of YUNIKORN-2292)
>  # 
> [https://github.com/apache/yunikorn-k8shim/actions/runs/7308989229/job/19960722817?pr=753#step:6:971
>  
> |https://github.com/apache/yunikorn-k8shim/actions/runs/7308989229/job/19960722817?pr=753#step:6:971](PR
>  of YUNIKORN-2247)
> The e2e test waits until application status turn into ‘Failing’. 
> ([gang_scheduling_test.go#L288|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/gang_scheduling/gang_scheduling_test.go#L288])
>  However, the application won't stay in "Failing" too long.  Below are my 
> local test results.
>  # 0.565 seconds
>  # 0.519 seconds
>  # 0.634 seconds
>  # 0.604 seconds
>  # 0.573 seconds
>  # 0.586 seconds
>  # 0.587 seconds
>  # 0.640 seconds
>  # 0.779 seconds
>  # 0.584 seconds
> (PS: Compare the time between 2 failApplication events, "Accept->Failing", 
> "Failing -> Failed")
> The polling frequency of checkAppStatus() is 300ms, so {color:#de350b}this 
> issue still can't be reproduced in my local environment.{color} However, we 
> still have no guarantee that the application will stay in 'Failing' longer 
> than 300 ms.
> (The dumped scheduler log of the e2e test is missing due to the issue 
> mentioned in YUNIKORN-2293. The e2e test didn't call 
> tests.LogYunikornContainer() in AfterEach. After YUNIKORN-2293 fixed, we will 
> be able to check the failed log in Github action.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to