[ 
https://issues.apache.org/jira/browse/YUNIKORN-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit closed YUNIKORN-1919.
----------------------------------

> runningApps is not correct when app state from starting to completing
> ---------------------------------------------------------------------
>
>                 Key: YUNIKORN-1919
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1919
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: PoAn Yang
>            Assignee: PoAn Yang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>
> We increase runningApps when app gets into starting state[1]. We decrease 
> runningApps when app leaves running state[2]. However, in some cases, app 
> doesn't get into running state, so the runningApps result will get error. 
> Finally, we can't allocate another app[3].
>  
> Reproduce steps:
> 1. Set queue config.
> {noformat}
> data:
>   queues.yaml: |
>     partitions:
>     - name: default
>       nodesortpolicy:
>         type: fair
>       queues:
>       - name: root
>         parent: true
>         queues:
>         - name: default # default queue for applications that don't specify a 
> queue
>           submitacl: '*'
>         - name: sandbox1
>           submitacl: '*'
>           maxapplications: 1{noformat}
> 2. Apply a deployment.
> {noformat}
> apiVersion: apps/v1
> kind: Deployment
> metadata:
>   name: sleep-deployment
>   labels:
>     app: sleep-deployment
>     applicationId: "sleep-deployment"
>     queue: "root.sandbox1"
> spec:
>   replicas: 1
>   selector:
>     matchLabels:
>       app: sleep-deployment
>       applicationId: "sleep-deployment"
>       queue: "root.sandbox1"
>   template:
>     metadata:
>       labels:
>         app: sleep-deployment
>         applicationId: "sleep-deployment"
>         queue: "root.sandbox1"
>     spec:
>       containers:
>       - name: sleep-30s
>         image: alpine:latest
>         command: ["sleep", "30"]{noformat}
> 3. Apply a job.
> {noformat}
> apiVersion: batch/v1
> kind: Job
> metadata:
>   name: sleep-job
> spec:
>   parallelism: 1
>   template:
>     metadata:
>       labels:
>         app: sleep-job
>         applicationId: "sleep-job"
>         queue: "root.sandbox1"
>     spec:
>       containers:
>       - name: sleep-job
>         image: alpine:latest
>         command: ["sleep",  "30"]
>       restartPolicy: Never{noformat}
> 4. Delete the deployment.
> 5. The pod of job can't get started.
>  
> [1] 
> [https://github.com/apache/yunikorn-core/blob/9abd5bff0b0340935f1a4467f433a941ad5f476f/pkg/scheduler/objects/application_state.go#L152]
> [2] 
> [https://github.com/apache/yunikorn-core/blob/9abd5bff0b0340935f1a4467f433a941ad5f476f/pkg/scheduler/objects/application_state.go#L188]
> [3] 
> [https://github.com/apache/yunikorn-core/blob/9abd5bff0b0340935f1a4467f433a941ad5f476f/pkg/scheduler/objects/queue.go#L1300-L1302]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to