Si Latt created YUNIKORN-1076:
---------------------------------
Summary: Robust handling of invalid Task Group annotation
Key: YUNIKORN-1076
URL: https://issues.apache.org/jira/browse/YUNIKORN-1076
Project: Apache YuniKorn
Issue Type: Bug
Components: core - scheduler
Reporter: Si Latt
For gang scheduling, task group information has to be defined in the annotation
section. If the provided YAML for task group info is invalid, such as missing
double quote for keys, it results in parse exception and gets logged in YK log.
However, when looking at Kubernetes event log, there is no indication that
exception happened during gang scheduling. Other gang scheduling events are
logged and hence give users the wrong impression that pods are gang scheduled
without any issue.
Current behavior is dangerous as it cause gang scheduling to not work in
production without users realizing any issue. We should probably take the
following actions:
# Reject / fail any app with invalid task group annotation
# Emit exceptions in kubernetes events for surfacing critical events happening
in the cluster.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]