[ 
https://issues.apache.org/jira/browse/YUNIKORN-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801349#comment-17801349
 ] 

Peter Bacsko commented on YUNIKORN-2293:
----------------------------------------

Good catch [~Yu-Lin Chen]. In fact, I suggest dropping the entire 
"log-to-the-console" approach. Instead, we should use the 
[upload-artifact|https://github.com/actions/upload-artifact?tab=readme-ov-file] 
Github action. We can file a separate Jira for it.

> Flaky E2E Test: Failed asserts in LogTestClusterInfoWrapper() blocked the 
> resources cleanup steps
> -------------------------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-2293
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2293
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: test - e2e
>            Reporter: Yu-Lin Chen
>            Assignee: Yu-Lin Chen
>            Priority: Major
>
> If an E2E test fails, we will dump the cluster status through the following 
> functions:
>  # 
> [test/e2e/wrappers.go#LogTestClusterInfoWrapper()|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/wrappers.go#L96]
>  # 
> [test/e2e/wrappers.go#LogYunikornContainer()|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/wrappers.go#L129]
> However, these log functions contain several assertions, and a failed 
> assertion will block other cleanup steps in AfterEach. Incomplete cleanup can 
> cause other E2E tests to fail.
>  
> For example, E2E test 
> ([#967|https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:11373])
>  failed due to a [flaky assert 
> |https://github.com/apache/yunikorn-k8shim/actions/runs/7356744028/job/20027836104#step:6:972]
>  in gang scheduling. The afterEach status have no application in queue, which 
> caused an [assert 
> function|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/wrappers.go#L112-L113]
>  failed.  Furthermore, the incompleted resources cleanup caused the following 
> E2E tests to fail as well:
>  * simple_preemptor
>  * state_aware_app_scheduling
>  * user_group_limit
> We should remove the assertions in those dump functions and just purely log 
> the error messages.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to