[ 
https://issues.apache.org/jira/browse/YUNIKORN-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154734#comment-17154734
 ] 

Weiwei Yang commented on YUNIKORN-268:
--------------------------------------

hi [~Huang Ting Yao]

Could you please work on this issue? In the k8shim repo, we have a 
{{ApplicationManagementProtocol}} interface defined for various am operations, 
one of the API is: RemoveApplication(appID string) error.
Currently, this is not fully implemented, the expectation is when this API is 
called, the shim should send a {{UpdateRequest}} to the core with the following 
field:

 UpdateRequest {
    RemoveApplications []*RemoveApplicationRequest
 }

to get app removed.

Currently, the scheduler core will remove all existing allocations, or pending 
asks if an app is removed. So if the shim lets some pods continue to run, it 
will cause some resource counting issues. To avoid that, I think we should add 
a safeguard, that protects the {{RemoveApplication()}} can only be proceed when 
the app doesn't have any running pods. If there is still some running pods, we 
directly fail the remove operation.

Please let me know if you have any questions.

> When deleting a deployment the application is not deleted from the shim cache
> -----------------------------------------------------------------------------
>
>                 Key: YUNIKORN-268
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-268
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: shim - kubernetes
>            Reporter: Kinga Marton
>            Assignee: Ting Yao,Huang
>            Priority: Major
>
> Steps to reproduce:
> 1. submit an example such as nginx example: 
> related log entry:
> {code:bash}
> 2020-07-08T15:15:54.683Z DEBUG general/general.go:155 pod added {"appType": 
> "general", "Name": "nginx-556d7c974d-sn7kf", "Namespace": "default", 
> "NeedsRecovery": false}
> 2020-07-08T15:15:54.684Z DEBUG cache/context.go:185 adding pod to cache 
> {"podName": "nginx-556d7c974d-sn7kf"}
> 2020-07-08T15:15:54.684Z DEBUG cache/context.go:455 AddApplication 
> {"Request": 
> {"Metadata":{"ApplicationID":"example2","QueueName":"root.sandbox","User":"default","Tags":{"namespace":"default"}},"Recovery":false}}
> 2020-07-08T15:15:54.684Z DEBUG cache/context.go:465 app namespace info 
> {"appID": "example2", "namespace": "default"}
> 2020-07-08T15:15:54.684Z INFO cache/context.go:495 app added {"appID": 
> "example2", "recovery": false}
> 2020-07-08T15:15:54.684Z DEBUG cache/context.go:526 AddTask {"appID": 
> "example2", "taskID": "8fe2c175-1521-41aa-a3a8-bf7f47668b6c", "isRecovery": 
> false}{code}
> 2. delete the example 
> 3. submit again the example, or any job with the same applicationId
> related log entry:
> {code:bash}
> 2020-07-08T15:25:21.754Z DEBUG general/general.go:155 pod added {"appType": 
> "general", "Name": "nginx-556d7c974d-qtxfg", "Namespace": "default", 
> "NeedsRecovery": false}
> 2020-07-08T15:25:21.754Z DEBUG general/general.go:155 pod added {"appType": 
> "general", "Name": "nginx-556d7c974d-qtxfg", "Namespace": "default", 
> "NeedsRecovery": false}
> 2020-07-08T15:25:21.754Z DEBUG cache/context.go:526 AddTask {"appID": 
> "example2", "taskID": "c4191833-cbdf-4143-ab53-c5cc2ccddf43", "isRecovery": 
> false}2020-07-08T15:25:21.754Z DEBUG cache/context.go:185 adding pod to cache 
> {"podName": "nginx-556d7c974d-qtxfg"}
> 2020-07-08T15:25:21.754Z INFO cache/context.go:542 task added {"appID": 
> "example2", "taskID": "c4191833-cbdf-4143-ab53-c5cc2ccddf43", "taskState": 
> "New"}{code}
> When submitting the example again with the same application ID, the related 
> application will be already found in cache, so the add application part will 
> be skipped.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to