[
https://issues.apache.org/jira/browse/YUNIKORN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799182#comment-17799182
]
Timothy Potter commented on YUNIKORN-2280:
------------------------------------------
ok, thank you for figuring this seemingly obvious issue out :( are there in
metrics in Yunikorn that we can monitor that would indicate this is the issue?
we're using EKS, so I'll have to figure out how to get more capacity there.
> Possible memory leak in scheduler
> ---------------------------------
>
> Key: YUNIKORN-2280
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2280
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Affects Versions: 1.3.0, 1.4.0
> Environment: EKS 1.24, we observed same behavior with YK 1.3.0 & 1.4.0
> Reporter: Timothy Potter
> Priority: Major
> Attachments: goroutine-dump.out, heap-dump-1001.out,
> heap-dump-1255.out, yunikor-scheduler-process-memory.png,
> yunikorn-process-memory-last9hours.png, yunikorn-scheduler-goroutines.png
>
>
> Memory for our scheduler pod slowly increases until it gets killed by kubelet
> for surpassing its memory limit.
> I've included two heap dump files collected about 3 hours apart, see process
> memory chart for the same period. Not really sure what to make of these heap
> dumps so hoping someone else who knows the code better might have some
> insights?
> from heap-dump-1001.out:
> {code}
> flat flat% sum% cum cum%
> 1.46GB 24.68% 24.68% 1.46GB 24.68% reflect.unsafe_NewArray
> 1.30GB 21.94% 46.63% 1.32GB 22.35%
> sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).literalStore
> 1.06GB 17.96% 64.58% 1.06GB 17.96%
> k8s.io/apimachinery/pkg/apis/meta/v1.(*FieldsV1).UnmarshalJSON
> 0.88GB 14.87% 79.45% 0.88GB 14.87% reflect.mapassign_faststr0
> {code}
> from heap-dump-1255.out:
> {code}
> flat flat% sum% cum cum%
> 1756.18MB 23.53% 23.53% 1756.18MB 23.53% reflect.unsafe_NewArray
> 1612.36MB 21.60% 45.13% 1645.86MB 22.05%
> sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).literalStore
> 1359.86MB 18.22% 63.35% 1359.86MB 18.22%
> k8s.io/apimachinery/pkg/apis/meta/v1.(*FieldsV1).UnmarshalJSON
> 1136.40MB 15.22% 78.57% 1136.40MB 15.22% reflect.mapassign_faststr0
> {code}
> We also see odd spikes in the # of goroutines but that doesn't seem
> correlated with the increase in memory (mainly just mentioning this in case
> it's unexpected)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]