[jira] [Commented] (YUNIKORN-2933) Don't add duplicated taskGroup to app

Wilfred Spiegelenburg (Jira) Mon, 21 Oct 2024 04:40:04 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17891510#comment-17891510
 ]


Wilfred Spiegelenburg commented on YUNIKORN-2933:
-------------------------------------------------

We have multiple cases that we need to describe and handle:
 # duplicate name with all other details same
 # duplicate name with different resources
 # duplicate name with different tolerations etc.

1 we could possible handle by merging into one task group, 2 we could use 
largest resources, 3 becomes really complex. It then also becomes a problem of 
which one do we keep? Again case 1 and 2 are probably doable but case 3 will 
not be simple and cause all kinds of problems. Combine the case 3 with case 2 
and you could have conflicting rules. We need to create a simple solution that 
always works and can be tracked simply via the pod.

I do not think we can create a simple fix for this in any other way than 
failing this in the admission controller. That would reject the submission and 
not let it get to the k8shim. Next would be to handle the case that there is no 
admission controller. Leaving that pod pending (i.e. not sending it to the 
core) and pushing some events to the pod that it "broken" is the best way to 
track and manage this.

If we make a choice to merge/ignore some pieces it will always be the wrong 
choice for someone. We thus do not make that choice and push it back to the 
submitter to figure out his mistake.

> Don't add duplicated taskGroup to app
> -------------------------------------
>
>                 Key: YUNIKORN-2933
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2933
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: shim - kubernetes
>    Affects Versions: 1.3.0, 1.5.2
>            Reporter: Xiaobao Wu
>            Priority: Minor
>              Labels: pull-request-available
>
>     When the app processes the _yunikorn.apache.org/task-groups_ annotation 
> on Pod, it did not consider the case of taskgroup with the same name (i.e. 
> taskgroup with the same {*}name{*}).For example, the following task-groups 
> information is defined:
>  
> {code:java}
> yunikorn.apache.org/task-groups='
> [{
>         "name": "spark-executor",
>         "minMember": 2,
>         "minResource": {
>             "cpu": "1",
>             "memory": "1Gi"
>         }
>     }, {
>         "name": "spark-driver",
>         "minMember": 1,
>         "minResource": {
>             "cpu": "1",
>             "memory": "1Gi"
>         }
>     },
>     {
>         "name": "spark-executor",
>         "minMember": 1,
>         "minResource": {
>             "cpu": "2",
>             "memory": "2Gi"
>         }
>     }
> ]
> '{code}
>     From the above example, it can be seen that there are two task-group with 
> *the same name* in the task-groups. So, after these ph Tasks are created, if 
> an executor task (2C 2G)
> try to allocate placeholder‘s resource, it may erroneously release the ph 
> task in the *spark-executor* group with a resource specification of 1C 1G 
> (because the resource specification cannot meet 2C 2G). In this regard, I 
> think that before properly handling ph tasks with different resource 
> specifications within the same task group, it is necessary to avoid having 
> two task-group with the same name but different resource specifications in 
> the app.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YUNIKORN-2933) Don't add duplicated taskGroup to app

Reply via email to