Yu-Lin Chen created YUNIKORN-2294:
-------------------------------------

             Summary: Flacky E2E Test: "Verify_Hard_GS_Failed_State" polling 
short-lived "Failing" application status
                 Key: YUNIKORN-2294
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2294
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: test - e2e
            Reporter: Yu-Lin Chen
            Assignee: Yu-Lin Chen


We got below E2E test fails In gang_scheduling e2e test 
“Verify_Hard_GS_Failed_State”.
 # 
[https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:972
 
|https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:972](PR
 of YUNIKORN-2292)
 # 
[https://github.com/apache/yunikorn-k8shim/actions/runs/7308989229/job/19960722817?pr=753#step:6:971
 
|https://github.com/apache/yunikorn-k8shim/actions/runs/7308989229/job/19960722817?pr=753#step:6:971](PR
 of YUNIKORN-2247)

The e2e test waits until application status turn into ‘Failing’. 
([gang_scheduling_test.go#L288|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/gang_scheduling/gang_scheduling_test.go#L288])
 However, the application won't stay in "Failing" too long.  Below are my local 
test results.

 # 0.565 seconds
 # 0.519 seconds
 # 0.634 seconds
 # 0.604 seconds
 # 0.573 seconds
 # 0.586 seconds
 # 0.587 seconds
 # 0.640 seconds
 # 0.779 seconds
 # 0.584 seconds

(PS: Compare the time between 2 failApplication events, "Accept->Failing", 
"Failing -> Failed")

The polling frequency of checkAppStatus() is 300ms, so {color:#de350b}this 
issue still can't be reproduced in my local environment.{color} However, we 
still have no guarantee that the application will stay in 'Failing' longer than 
300 ms.

(The dumped scheduler log of the e2e test is missing due to the issue mentioned 
in YUNIKORN-2293. The e2e test didn't call tests.LogYunikornContainer() in 
AfterEach. After YUNIKORN-2293 fixed, we will be able to check the failed log 
in Github action.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to