[ 
https://issues.apache.org/jira/browse/YUNIKORN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799178#comment-17799178
 ] 

Craig Condit commented on YUNIKORN-2280:
----------------------------------------

Nearly all the goroutines (7240 out of 7352) are blocked in similar places to 
this:
{code:java}
goroutine 166864353 [sync.Mutex.Lock]:
sync.runtime_SemacquireMutex(0x0?, 0x1?, 0xc2103cf200?)
        runtime/sema.go:77 +0x25
sync.(*Mutex).lockSlow(0xc055885298)
        sync/mutex.go:171 +0x15d
sync.(*Mutex).Lock(...)
        sync/mutex.go:90
k8s.io/client-go/tools/events.(*eventBroadcasterImpl).recordToSink.func1.1(0xc055885290,
 0xc0abc7c000, {0x2097590, 0x2fe7d20})
        k8s.io/[email protected]/tools/events/event_broadcaster.go:175 +0x85
k8s.io/client-go/tools/events.(*eventBroadcasterImpl).recordToSink.func1()
        k8s.io/[email protected]/tools/events/event_broadcaster.go:197 +0x28
created by k8s.io/client-go/tools/events.(*eventBroadcasterImpl).recordToSink 
in goroutine 240
        k8s.io/[email protected]/tools/events/event_broadcaster.go:173 +0xdb 
{code}
This looks very much like the amount of traffic being sent to the API server is 
being rate-limited, which can cause backlogs like this in busy clusters. 
Ideally, you should adjust the rate-limiting settings on the API server to 
allow more traffic. Otherwise, these will continue to build up until you crash 
with out of memory. The goroutine dump also aligns with the heap dump showing 
lots of JSON objects in memory, as each of these blocked goroutines represents 
an outstanding request to the API server to send an event. 

I'm going to close this ticket however, as this is not a YuniKorn issue – it's 
a configuration problem with the Kubernetes API server.

> Possible memory leak in scheduler
> ---------------------------------
>
>                 Key: YUNIKORN-2280
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2280
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: EKS 1.24, we observed same behavior with YK 1.3.0 & 1.4.0
>            Reporter: Timothy Potter
>            Priority: Major
>         Attachments: goroutine-dump.out, heap-dump-1001.out, 
> heap-dump-1255.out, yunikor-scheduler-process-memory.png, 
> yunikorn-process-memory-last9hours.png, yunikorn-scheduler-goroutines.png
>
>
> Memory for our scheduler pod slowly increases until it gets killed by kubelet 
> for surpassing its memory limit. 
> I've included two heap dump files collected about 3 hours apart, see process 
> memory chart for the same period. Not really sure what to make of these heap 
> dumps so hoping someone else who knows the code better might have some 
> insights?
> from heap-dump-1001.out:
> {code}
>       flat  flat%   sum%        cum   cum%
>     1.46GB 24.68% 24.68%     1.46GB 24.68%  reflect.unsafe_NewArray
>     1.30GB 21.94% 46.63%     1.32GB 22.35%  
> sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).literalStore
>     1.06GB 17.96% 64.58%     1.06GB 17.96%  
> k8s.io/apimachinery/pkg/apis/meta/v1.(*FieldsV1).UnmarshalJSON
>     0.88GB 14.87% 79.45%     0.88GB 14.87%  reflect.mapassign_faststr0
> {code}
> from heap-dump-1255.out:
> {code}
>       flat  flat%   sum%        cum   cum%
>  1756.18MB 23.53% 23.53%  1756.18MB 23.53%  reflect.unsafe_NewArray
>  1612.36MB 21.60% 45.13%  1645.86MB 22.05%  
> sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).literalStore
>  1359.86MB 18.22% 63.35%  1359.86MB 18.22%  
> k8s.io/apimachinery/pkg/apis/meta/v1.(*FieldsV1).UnmarshalJSON
>  1136.40MB 15.22% 78.57%  1136.40MB 15.22%  reflect.mapassign_faststr0
> {code}
> We also see odd spikes in the # of goroutines but that doesn't seem 
> correlated with the increase in memory (mainly just mentioning this in case 
> it's unexpected)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to