[ 
https://issues.apache.org/jira/browse/YUNIKORN-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit updated YUNIKORN-576:
----------------------------------
    Fix Version/s: 0.10

> YK unable to schedule post rejecting an app
> -------------------------------------------
>
>                 Key: YUNIKORN-576
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-576
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Ayub Pathan
>            Priority: Critical
>             Fix For: 0.10
>
>         Attachments: gang-app-timeout-no-gang.yaml, stack, yk.log
>
>
> * Tried submitting an app( [^gang-app-timeout-no-gang.yaml] ) with min member 
> == parallelism. I see the app is rejected by scheduler. After this whatever 
> app submitted is not getting scheduled...
> * App is rejected with below error, after placeholder pods are timed out.
> {noformat}
> 2021-03-16T03:12:41.214Z      INFO    scheduler/context.go:674        Invalid 
> ask add requested by shim       {"partition": "[mycluster]default", 
> "applicationID": "gang-app-timeout-1009", "askKey": 
> "cf58523b-9750-40b8-b148-b3319bdf3edf", "error": "failed to find application 
> gang-app-timeout-1009, for allocation ask 
> cf58523b-9750-40b8-b148-b3319bdf3edf"}
> 2021-03-16T03:12:41.214Z      WARN    cache/task.go:415       task allocation 
> UUID is empty, sending this release request to yunikorn-core could cause all 
> allocations of this app get released. skip this request, this may cause some 
> resource leak. check the logs for more info!  {"applicationID": 
> "gang-app-timeout-1009", "taskID": "cf58523b-9750-40b8-b148-b3319bdf3edf", 
> "taskAlias": "fifo/gang-app-timeout-1009-h5qlh", "allocationUUID": "", 
> "task": "Failed"}
> 2021-03-16T03:12:41.214Z      ERROR   cache/task.go:243       task failed     
> {"appID": "gang-app-timeout-1009", "taskID": 
> "cf58523b-9750-40b8-b148-b3319bdf3edf", "reason": "task 
> fifo/gang-app-timeout-1009-h5qlh failed because it is rejected by scheduler"}
> github.com/apache/incubator-yunikorn-k8shim/pkg/cache.(*Task).handleFailEvent
>       
> /grid/0/jenkins/workspace/workspace/App_builds/SOURCES/yunikorn-k8shim/pkg/cache/task.go:243
> github.com/looplab/fsm.(*FSM).afterEventCallbacks
>       /grid/0/jenkins/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:414
> github.com/looplab/fsm.(*FSM).Event.func1
>       /grid/0/jenkins/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:309
> github.com/looplab/fsm.transitionerStruct.transition
>       /grid/0/jenkins/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:354
> github.com/looplab/fsm.(*FSM).doTransition
>       /grid/0/jenkins/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:339
> github.com/looplab/fsm.(*FSM).Event
>       /grid/0/jenkins/go/pkg/mod/github.com/looplab/[email protected]/fsm.go:321
> github.com/apache/incubator-yunikorn-k8shim/pkg/cache.(*Task).handle
>       
> /grid/0/jenkins/workspace/workspace/App_builds/SOURCES/yunikorn-k8shim/pkg/cache/task.go:152
> github.com/apache/incubator-yunikorn-k8shim/pkg/cache.(*Context).TaskEventHandler.func1
>       
> /grid/0/jenkins/workspace/workspace/App_builds/SOURCES/yunikorn-k8shim/pkg/cache/context.go:770
> github.com/apache/incubator-yunikorn-k8shim/pkg/dispatcher.Start.func1
>       
> /grid/0/jenkins/workspace/workspace/App_builds/SOURCES/yunikorn-k8shim/pkg/dispatcher/dispatcher.go:194
> 2021-03-16T03:12:41.896Z      INFO    general/general.go:221  task completes  
> {"appType": "general", "namespace": "fifo", "podName": 
> "tg-timeout-1009-gang-app-timeout-1009-0", "podUID": 
> "11c4a9dd-7ec4-4dee-8e36-eb0dc74bb6d1", "podStatus": "Failed"}
> {noformat}
> * After this error, any app submitted is not scheduled.
> {noformat}
> gang-app-timeout-1010-dph4q               0/1     Pending     0          11m
> gang-app-timeout-1010-f7zmp               0/1     Pending     0          11m
> gang-app-timeout-1010-xmzfk               0/1     Pending     0          11m
> tg-timeout-1010-gang-app-timeout-1010-0   0/1     Pending     0          11m
> tg-timeout-1010-gang-app-timeout-1010-1   0/1     Pending     0          11m
> tg-timeout-1010-gang-app-timeout-1010-2   0/1     Pending     0          11m
> {noformat}
> Complete logs are attached  [^yk.log] .
> Stack trace attached  [^stack] .



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to