[jira] [Commented] (YUNIKORN-2280) Possible memory leak in scheduler

Craig Condit (Jira) Wed, 20 Dec 2023 19:08:44 -0800


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799238#comment-17799238
 ]


Craig Condit commented on YUNIKORN-2280:
----------------------------------------

The number of events emitted is directly proportional to the number of pods we 
process. “Rate-limiting” them will only result in them not being sent or not 
sent in a timely manner. If a cluster is busy enough to process a given number 
of pods for scheduling, then it must also be configured to support the events 
that traffic generates, full stop.

> Possible memory leak in scheduler
> ---------------------------------
>
>                 Key: YUNIKORN-2280
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2280
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>    Affects Versions: 1.3.0, 1.4.0
>         Environment: EKS 1.24, we observed same behavior with YK 1.3.0 & 1.4.0
>            Reporter: Timothy Potter
>            Priority: Major
>         Attachments: goroutine-dump.out, heap-dump-1001.out, 
> heap-dump-1255.out, yunikor-scheduler-process-memory.png, 
> yunikorn-process-memory-last9hours.png, yunikorn-scheduler-goroutines.png
>
>
> Memory for our scheduler pod slowly increases until it gets killed by kubelet 
> for surpassing its memory limit. 
> I've included two heap dump files collected about 3 hours apart, see process 
> memory chart for the same period. Not really sure what to make of these heap 
> dumps so hoping someone else who knows the code better might have some 
> insights?
> from heap-dump-1001.out:
> {code}
>       flat  flat%   sum%        cum   cum%
>     1.46GB 24.68% 24.68%     1.46GB 24.68%  reflect.unsafe_NewArray
>     1.30GB 21.94% 46.63%     1.32GB 22.35%  
> sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).literalStore
>     1.06GB 17.96% 64.58%     1.06GB 17.96%  
> k8s.io/apimachinery/pkg/apis/meta/v1.(*FieldsV1).UnmarshalJSON
>     0.88GB 14.87% 79.45%     0.88GB 14.87%  reflect.mapassign_faststr0
> {code}
> from heap-dump-1255.out:
> {code}
>       flat  flat%   sum%        cum   cum%
>  1756.18MB 23.53% 23.53%  1756.18MB 23.53%  reflect.unsafe_NewArray
>  1612.36MB 21.60% 45.13%  1645.86MB 22.05%  
> sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).literalStore
>  1359.86MB 18.22% 63.35%  1359.86MB 18.22%  
> k8s.io/apimachinery/pkg/apis/meta/v1.(*FieldsV1).UnmarshalJSON
>  1136.40MB 15.22% 78.57%  1136.40MB 15.22%  reflect.mapassign_faststr0
> {code}
> We also see odd spikes in the # of goroutines but that doesn't seem 
> correlated with the increase in memory (mainly just mentioning this in case 
> it's unexpected)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

[jira] [Commented] (YUNIKORN-2280) Possible memory leak in scheduler

Reply via email to