Mit Desai created YUNIKORN-3192:
-----------------------------------

             Summary: Easier multi-tenant queue management
                 Key: YUNIKORN-3192
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3192
             Project: Apache YuniKorn
          Issue Type: New Feature
            Reporter: Mit Desai


Apache YuniKorn manages partition and queue configuration through a 
{{queues.yaml}} section stored in the {{yunikorn-configs}} ConfigMap, which 
defines the full hierarchy under the {{{}root{}}}queue, including capacities, 
ACLs, and placement rules. This central configuration is shared by all tenants, 
so any change to tenant queues requires editing the same global YAML file.

In multi-tenant environments, tenants are often represented as child queues 
under a common root (for example {{{}root.tenants.team-a{}}}, 
{{{}root.tenants.team-b{}}}), with per-tenant quotas and ACLs defined in 
{{{}queues.yaml{}}}. As the number of tenants grows, this monolithic 
configuration becomes difficult to maintain, since onboarding a new tenant, 
updating quotas, or offboarding an existing tenant all involve manual edits to 
a single, increasingly complex document.

This setup introduces several operational issues:
 * Small, tenant-specific changes require touching a cluster-wide 
configuration, increasing the risk that a mistake affects all tenants.

 * Manual YAML editing is error-prone; syntax or structural errors in 
{{queues.yaml}} can break or degrade scheduling behavior for the entire cluster.

 * There is no clear way to delegate safe queue management to tenant or 
platform teams, because they must modify the same shared configuration as 
cluster administrators.

 * Day‑2 operations like frequent tenant onboarding, resizing of quotas, and 
retirement become slow and fragile, as they depend on careful manual updates to 
a central ConfigMap.

The current model is powerful in terms of what can be expressed, but it is not 
easy to operate at scale for dynamic, multi-tenant workloads, where queue 
changes are frequent and need to be performed safely and with minimal blast 
radius.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to