[jira] [Resolved] (YUNIKORN-528) Nil pointer exception while getting both termination and delete pod event

Wilfred Spiegelenburg (Jira) Mon, 01 Feb 2021 00:59:04 -0800


     [ 
https://issues.apache.org/jira/browse/YUNIKORN-528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wilfred Spiegelenburg resolved YUNIKORN-528.
--------------------------------------------
    Fix Version/s: 0.10
       Resolution: Fixed

Changes committed to trunk and 0.10

Shim changes to not send duplicates will be handled separately.

> Nil pointer exception while getting both termination and delete pod event
> -------------------------------------------------------------------------
>
>                 Key: YUNIKORN-528
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-528
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: shim - kubernetes
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.10
>
>         Attachments: nil.log
>
>
> During the test, I observed on some occasions the scheduler could run into 
> Nil pointer exception like below:
> {code}
> 4-261f-4448-bc0f-5ea14d23f9e8"}
> panic: runtime error: invalid memory address or nil pointer dereference
> [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x190cf5d]
> goroutine 114 [running]:
> github.com/apache/incubator-yunikorn-core/pkg/scheduler/objects.(*Application).ReplaceAllocation(0xc004250000,
>  0xc0038e01b0, 0x24, 0x0)
>       
> /Users/wyang/go/pkg/mod/github.com/apache/[email protected]/pkg/scheduler/objects/application.go:1026
>  +0xcd
> github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*PartitionContext).removeAllocation(0xc0026de600,
>  0xc0003c0a10, 0x0, 0x0, 0x0, 0x0)
>       
> /Users/wyang/go/pkg/mod/github.com/apache/[email protected]/pkg/scheduler/partition.go:1137
>  +0x14b5
> github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocationReleases(0xc0001400f0,
>  0xc0066400c0, 0x1, 0x1, 0x7ffeefbff80f, 0x9)
>       
> /Users/wyang/go/pkg/mod/github.com/apache/[email protected]/pkg/scheduler/context.go:683
>  +0x150
> github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocations(0xc0001400f0,
>  0xc006730000)
>       
> /Users/wyang/go/pkg/mod/github.com/apache/[email protected]/pkg/scheduler/context.go:606
>  +0x185
> github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*ClusterContext).processRMUpdateEvent(0xc0001400f0,
>  0xc0066ee0b8)
>       
> /Users/wyang/go/pkg/mod/github.com/apache/[email protected]/pkg/scheduler/context.go:213
>  +0x77
> github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent(0xc00000e3c0)
>       
> /Users/wyang/go/pkg/mod/github.com/apache/[email protected]/pkg/scheduler/scheduler.go:112
>  +0x416
> created by 
> github.com/apache/incubator-yunikorn-core/pkg/scheduler.(*Scheduler).StartService
>       
> /Users/wyang/go/pkg/mod/github.com/apache/[email protected]/pkg/scheduler/scheduler.go:54
>  +0xa2
> make: *** [run] Error 2
> {code}
> the root cause is when the shim deletes a placeholder, it can trigger 2 
> events sometime,
> * Pod Update
> * Pod Delete
> When a pod updated to TERMINATED state and when a pod gets DELETED, the shim 
> will send a release request to the core. But when there is a second release 
> request, as the previous one already removed the allocation, then we are 
> hitting the Nil pointer. We need to avoid sending a second time release if 
> the pod is already released.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (YUNIKORN-528) Nil pointer exception while getting both termination and delete pod event

Reply via email to