[ 
https://issues.apache.org/jira/browse/YUNIKORN-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PoAn Yang resolved YUNIKORN-2070.
---------------------------------
     Fix Version/s: 1.4.0
    Target Version: 1.4.0
        Resolution: Fixed

Merge to master.

> E2e tests for gang_scheduling failed due to containers init were OOM-Killed
> ---------------------------------------------------------------------------
>
>                 Key: YUNIKORN-2070
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2070
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: test - e2e
>            Reporter: Yu-Lin Chen
>            Assignee: Yu-Lin Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>         Attachments: [YUNIKORN-2067] e2e-tests (v1.25.11) - OOM-Killed.txt
>
>
> Recently we encountered several gang scheduling errors in CI e2e test, all of 
> the failures are waiting for the creation of placeholders(with 10M memory 
> limit). However, some placeholders are failed with below OOM-killed error:
> {code:java}
> “Error: failed to create containerd task: failed to create shim task: OCI 
> runtime create failed: runc create failed: unable to start container process: 
> container init was OOM-killed (memory limit too low?): unknown” {code}
> The root cause might be the varying memory peak when OCI runtime create 
> multiple containers. We can try to change placeholder memory limit from 10M 
> to 20M in e2e test. (Sleep jobs are using 20M memory.)
> List some failed e2e test in last 3 weeks:
>  # 
> ([Link|https://github.com/apache/yunikorn-k8shim/actions/runs/6604772421/job/17945394693#step:5:2452])
>  Target 15 placeholder, 14 created 1 OOM-Killed.
>  # 
> ([Link|https://github.com/apache/yunikorn-k8shim/actions/runs/6596361827/job/17922430982#step:5:2315])
>  Target 3 placeholder, 2 created 1 OOM-Killed.
>  # 
> ([Link|https://github.com/apache/yunikorn-k8shim/actions/runs/6408692237/job/17436748282#step:5:2510])
>  Target 3 placeholder, 2 created 1 OOM-Killed.
>  # 
> ([Link|https://github.com/apache/yunikorn-k8shim/actions/runs/6545212501/job/17773871963#step:5:2798])
>  Target 15 placeholder, 11 created 4 OOM-Killed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to