[
https://issues.apache.org/jira/browse/YUNIKORN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891510#comment-17891510
]
Wilfred Spiegelenburg commented on YUNIKORN-2933:
-------------------------------------------------
We have multiple cases that we need to describe and handle:
# duplicate name with all other details same
# duplicate name with different resources
# duplicate name with different tolerations etc.
1 we could possible handle by merging into one task group, 2 we could use
largest resources, 3 becomes really complex. It then also becomes a problem of
which one do we keep? Again case 1 and 2 are probably doable but case 3 will
not be simple and cause all kinds of problems. Combine the case 3 with case 2
and you could have conflicting rules. We need to create a simple solution that
always works and can be tracked simply via the pod.
I do not think we can create a simple fix for this in any other way than
failing this in the admission controller. That would reject the submission and
not let it get to the k8shim. Next would be to handle the case that there is no
admission controller. Leaving that pod pending (i.e. not sending it to the
core) and pushing some events to the pod that it "broken" is the best way to
track and manage this.
If we make a choice to merge/ignore some pieces it will always be the wrong
choice for someone. We thus do not make that choice and push it back to the
submitter to figure out his mistake.
> Don't add duplicated taskGroup to app
> -------------------------------------
>
> Key: YUNIKORN-2933
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2933
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: shim - kubernetes
> Affects Versions: 1.3.0, 1.5.2
> Reporter: Xiaobao Wu
> Priority: Minor
> Labels: pull-request-available
>
> When the app processes the _yunikorn.apache.org/task-groups_ annotation
> on Pod, it did not consider the case of taskgroup with the same name (i.e.
> taskgroup with the same {*}name{*}).For example, the following task-groups
> information is defined:
>
> {code:java}
> yunikorn.apache.org/task-groups='
> [{
> "name": "spark-executor",
> "minMember": 2,
> "minResource": {
> "cpu": "1",
> "memory": "1Gi"
> }
> }, {
> "name": "spark-driver",
> "minMember": 1,
> "minResource": {
> "cpu": "1",
> "memory": "1Gi"
> }
> },
> {
> "name": "spark-executor",
> "minMember": 1,
> "minResource": {
> "cpu": "2",
> "memory": "2Gi"
> }
> }
> ]
> '{code}
> From the above example, it can be seen that there are two task-group with
> *the same name* in the task-groups. So, after these ph Tasks are created, if
> an executor task (2C 2G)
> try to allocate placeholder‘s resource, it may erroneously release the ph
> task in the *spark-executor* group with a resource specification of 1C 1G
> (because the resource specification cannot meet 2C 2G). In this regard, I
> think that before properly handling ph tasks with different resource
> specifications within the same task group, it is necessary to avoid having
> two task-group with the same name but different resource specifications in
> the app.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]