[
https://issues.apache.org/jira/browse/YUNIKORN-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758786#comment-17758786
]
Wilfred Spiegelenburg commented on YUNIKORN-1934:
-------------------------------------------------
gianluca perna [16 hours
ago|https://yunikornworkspace.slack.com/archives/CLNUW68MU/p1692868430814269]
{quote}Hello everyone!
We are installing YuniKorn on our RKE cluster because we would like to manage
Spark using the YuniKorn scheduler. We have installed and configured
everything, which seems to be working for now, but we have some questions that
hopefully can find answers here.
1) When we create queues, is it possible to define resources using percentages
instead of fixed numbers? It would be helpful in case we scale up the cluster
without having to reconfigure everything.
2) We have noticed that the only way we managed to distribute resources using
preemption is as follows.
Let’s assume we have a resource pool of size 100.
User 1 arrives in the cluster, submits the Spark job, and takes all the
resources.
Then comes User 2, submits the job, and preemption “takes away” the guaranteed
resources defined for User 1 and assigns them to User 2.
Is there a way to use a 1/N policy? Meaning, when User 2 arrives, both get 50%
of the resources, with User 3 getting 33% and so on?
3) Another question, to manage all our users with preemption, we created a
“spark” queue under the root queue, and under this, a queue for each individual
user. Guaranteed resources were assigned to each user, so that everyone could
have some computing power in the worst case. However, we noticed that initially
the mechanism was being rejected because in the spark queue we defined the max
resources as the actual resources of our cluster, while the sum of the
guaranteed resources per user was greater than the max resources of the spark
queue. As a workaround, we set the maximum values of the spark queue to
enormous values, so that the sum of guaranteed resources for users would never
reach that limit. What would be the best practice? I am attaching a photo for
the third point.
Thank you very much!{quote}
!Screenshot 2023-08-24 at 11.07.28.png|width=424,height=100!
!Screenshot 2023-08-24 at 11.07.41.png|width=419,height=106!
Wilfred Spiegelenburg
{quote}1: there is an open jira for that, Rainie is working on it.
2: no, something like that was discussed but nothing came of that yet as it was
really complex to implement
3: my recommendation: do not set a max size on the spark queue, the root queue
will reflect the overall size of you r cluster already.Using percentages in
your setup with a fixed cluster would be nice. Make sure to add a comment to
the jira:{quote}
{quote}https://issues.apache.org/jira/browse/YUNIKORN-1728{quote}
gianluca perna
{quote}Hi Wilfred, first of all thanks a lot for yours hints and time.
About point 2, can I ask you why it is complicated? I mean, it seems to me that
something similar is already implemented, the problem is just on the resources
that you need to free.
To be clear, at the moment, what you do with preemption is to free an amount of
resources that is the “Guaranted resources” specified by the user-queue, so,
based on the values that you read in the config.
At this point, if the queue root knows about the total resources, why you don’t
use a counter of the active users to divide total resources by that factor?
It is at the end the same stuff, but just with a more aggressive preemption in
that case.About point 3, we tried to leave the spark queue without any value,
but it seems that if you create a lot of queue (one per user in our case), if
the sum of all the inner-queue in terms of guaranteed resources is grater then
the maximum cluster resources, a priori the system refuse to add all the
desired queue.
Maybe we are wrong in something, I’ll try again this afternoon a test.Thanks a
lot!{quote}
Wilfred Spiegelenburg
{quote}Guaranteeing more resources than that are available in the cluster is a
problem. At that point you create a state that might never become stable.
Preemption would in a number of cases not be able to get that guarantee. Which
means that you really are back to just the FIFO or Priority based
scheduling.With 2 it would always be based on queues. We do not guarantee
resources for a user. That means the first assumption would be one queue per
user. It would thus be on active queues not directly users
Just dividing the cluster in pieces like that could leave you with really small
guarantees for each queue. Really small guarantees do not work. Specially when
the guarantee becomes as small as a single allocation. The other assumption you
made is that there is nothing besides these user queues. If you have a mixed
setup with some user based queues and some mixed load queues with guarantees it
becomes complex.The last point around percentages is that some resources, like
GPUs, are low in count and not splittable. You don’t have GPUs in similar
numbers as memory or cpu. Plus 1/6 of a GPU is not possible it it either 0 or
1. Different types different handling…{quote}
gianluca perna
{quote}Understood, thank you. So basically, what would be the right approach in
our case using YuniKorn?
I mean, we have hundreds of users in our cluster, who are clearly not active at
the same time. So our idea was to split the guaranteed resources a little bit,
but always in such a way that the sum of the guaranteed resources per user was
greater than the total of the cluster. This is because it is generally a rare
occurrence to see more than 30 percent of the users active at the same time.
Is so that wrong to create a queue per user?
Thanks a lot for your patience, your help is really appreciated{quote}
Sunil Govindan
{quote}[@Wilfred
Spiegelenburg|https://yunikornworkspace.slack.com/team/ULRU2BU6B] can they use
dynamic queues per user which dynamic max capacity with a defined guarantee
capacity?{quote}
Wilfred Spiegelenburg
{quote}I still think the % approach is the right way on a per user queue.
I think we need to combine that with a minimum for the resulting value we
calculate based on that %. Result should never be less than 20GB/1 cpu etc.
Exclude the percentage based quota from the size check. We already do that
implicitly when we use a child template. In that case we mostly circumvent the
whole more guaranteed than available case.
We might need to combine that with the limitation that we do not allow mixing
of fixed and percentage based values at the same level in the tree.
One other thing we can think of is allowing an oversubscribed guaranteed
quota.Capturing all this in a jira{quote}
> Guaranteed quota distribution
> -----------------------------
>
> Key: YUNIKORN-1934
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1934
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: core - scheduler
> Reporter: Wilfred Spiegelenburg
> Priority: Major
> Attachments: Screenshot 2023-08-24 at 11.07.28.png, Screenshot
> 2023-08-24 at 11.07.41.png
>
>
> Discussion on slack around guaranteed quota distribution full discussion in
> the comments.
> Main points:
> * percentage for guaranteed quota
> * limitation of sum of guaranteed quota for queues to the cluster size when
> not all queues are active
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]