[
https://issues.apache.org/jira/browse/YUNIKORN-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786256#comment-17786256
]
Craig Condit edited comment on YUNIKORN-2156 at 11/15/23 9:27 AM:
------------------------------------------------------------------
I’m not sure this is a good idea. For one thing we don’t want any app-specific
code in the core. We definitively should not be messing with priorities for
different pods within an application.an app may be scheduled as a hang it
enforce a minimum viable number of pods for the app to function properly, but
at some later point demand may spike and additional pods may be needed quite
quickly (the app may submit high-priority pods). Even within Spark this could
happen in the case of Spark streaming from something like a Kafka topic that
has its partition count expanded due to increasing load.
Quota should also not impact this - it’s completely valid for users to schedule
more tasks than they have quota for. Quota is also per-queue and not per ap, so
adding logic like this just doesn’t make sense.
These considerations really need to happen within the application framework
that submits pods (in this case Spark). Only the application itself knows what
priorities should be used or how to handle the case where pods have been
requested but not received in a timely manner.
was (Author: ccondit):
I’m not sure this is a good idea. For one thing we don’t want any app-specific
code in the core. We definitively should not be messing with priorities for
different pods within an application.an app may be scheduled as a hang it
enforce a minimum viable number of pods for the app to function properly, but
at some later point demand may spike and additional pods may be needed quite
quickly (the app may submit high-priority pods). Even within Spark this could
happen in the case of Spark streaming from something like a Kafka topic that
has its partition count expanded due to increasing load.
Quota should also not impact this - it’s completely valid for users to schedule
more tasks than they have quota for. Quota is also per-queue and not per ap, so
adding logic like this just doesn’t make sense.
> Improvement for gang scheduling when spark dynamic allocation enabled
> ---------------------------------------------------------------------
>
> Key: YUNIKORN-2156
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2156
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: core - scheduler
> Reporter: Qi Zhu
> Priority: Major
>
> Try to improve the case, Spark Dynamic Allocation when we enable gang
> scheduling:
> If we can add some improvements to this case? Such as:
> *For preemption case:*
> lower priority for anything more than the gang and thus get them killed
> earlier than when they do not belong to a gang
> *For quota hit:*
> For normal gang scheduling, we will reject jobs with total gang size > quota,
> but for dynamic allocations, if we can add monitor logic to monitor the
> exceeded resource allocations for those jobs?
> * We may extend the quota some value when it happens?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]