Peter Bacsko created YUNIKORN-1169:
--------------------------------------
Summary: Fix ApplicationMetadata restoration during recovery
Key: YUNIKORN-1169
URL: https://issues.apache.org/jira/browse/YUNIKORN-1169
Project: Apache YuniKorn
Issue Type: Bug
Components: shim - kubernetes
Reporter: Peter Bacsko
The following code in {{general.go}} handles the recovery part:
{noformat}
for _, pod := range appPods {
log.Logger().Debug("Looking at pod for recovery candidates",
zap.String("podNamespace", pod.Namespace), zap.String("podName", pod.Name))
// general filter passes, and pod is assigned
// this means the pod is already scheduled by scheduler for an
existing app
if utils.GeneralPodFilter(pod) && utils.IsAssignedPod(pod) {
if meta, ok := os.getAppMetadata(pod); ok {
podsRecovered++
log.Logger().Debug("Adding appID as recovery
candidate", zap.String("appID", meta.ApplicationID))
if _, exist :=
existingApps[meta.ApplicationID]; !exist {
existingApps[meta.ApplicationID] = meta
}
...
{noformat}
The crucial part is the handling of {{existingApps}} map. It's populated only
once - however, there's no guarantee that all pods have the same tags or
ownerReferences.
The scope of this JIRA is to analyze the possible side-effects of this code and
come up with a better solution. A bug was already identified because of this
(see YUNIKORN-1161).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]