Thanks for your answer. You are right -- this is a single-node cluster, and I 
was just relying on the limits on the node. I was not setting a queue quota.

You are also right that when I use annotations on the namespace to set queue 
quotas, it works as expected: A-3 gets scheduled before B-1.

While trying this some more, I also noticed that even when I do *not* set a 
queue quota, the order is as-expected (A-3, then B-1) after restarting 
Yunikorn. But on subsequent runs, the order is always B-1, then A-3.

Is it required to have queue quotas to make this work in the expected way? Or 
should it also work if resources are only constrained at the node or cluster 
level? 

I should probably mention: I am doing this in a cluster created with Rancher 
2.5.5 and running k3s v1.18.8+k3s1. I don’t know if that matters or not.

> -----Original Message-----
> From: Weiwei Yang <[email protected]>
> Sent: Friday, February 5, 2021 4:47 PM
> To: [email protected]
> Subject: Re: Unexpected scheduler behavior with K8s
> 
> *** WARNING ***
> EXTERNAL EMAIL -- This message originates from outside our organization.
> 
> 
> Hi Patrick
> 
> Thanks for reaching out.
> I have one question about how you limit there is only 2 pod running at a
> time. Are you running this on a single node cluster, and limit that at the
> node resource level?
> Based on the use case you described. My guess is the scheduler creates
> Reservations for B-1 and A-3, and when we unreserve these reservations, the
> FIFO ordering was not strictly honored.
> Have you set any queue quota? Here is the doc about how to set the queue
> mapping with quota set:
> http://yunikorn.apache.org/docs/next/user_guide/resource_quota_manageme
> nt#namespace-to-queue-mapping.
> If we limit the running pods by quota, then I think we will see expected
> behavior.
> The document about app sorting policy is here:
> http://yunikorn.apache.org/docs/next/user_guide/sorting_policies#application-
> sorting
> .
> 
> Weiwei
> 
> On Fri, Feb 5, 2021 at 1:04 PM Patrick, Alton (US)
> <[email protected]> wrote:
> 
> > Hi. I have just started using Yunikorn for scheduling with K8s. I have
> > been doing some simple experiments to make sure I understand how it works.
> > Most of them are working as expected, but there is one I don't understand.
> >
> > I used Helm to deploy Yunikorn and I did not modify anything, so I have
> > the default setup: one queue per namespace, and I assume the application
> > sort policy is the default of "FifoSortPolicy."
> >
> > I create four pods. All of them have the resource requests set the same (2
> > cores, 1Gi mem), and the resource requests are such that only two can run
> > at a time. The pods are created in this order, with a 1 second gap between
> > each:
> >
> > 1. A-1, applicationId = A, sleeps for 10s
> > 2. A-2, applicationId = A, sleeps for 5s
> > 3. B-1, applicationId = B, sleeps for 5s
> > 4. A-3, applicationId = A, sleeps for 5s
> >
> > What I expect to see is:
> > * A-1 is scheduled
> > * A-2 is scheduled
> > * A-2 finishes
> > * A-3 is scheduled (because A is the first application created, as long as
> > there are pods in the queue for application A I understand that they should
> > have priority over pods for application B)
> >
> > What I see instead is that after A-2 finishes, B-1 gets scheduled to run.
> >
> > Is this the expected behavior, and if so can someone explain what is wrong
> > with my understanding?
> >
> > Additionally, in the logs for the scheduler pod, right after pod B-1 gets
> > scheduled, I see the following messages repeated thousands of times very
> > fast (over 2000 instances in about .25s according to timestamps in log). Is
> > this normal?
> >
> > 2021-02-05T20:25:34.479Z DEBUG scheduler/scheduling_application.go:641
> > skipping node for allocation: basic condition not satisfied {"node":
> > "local-node", "allocationKey": "258d9947-e92b-4967-9758-08eee62f4d1b",
> > "error": "pre alloc check: requested resource map[memory:1074 vcore:2000]
> > is larger than currently available map[ephemeral-storage:50977832921
> > hugepages-2Mi:0 memory:13250 pods:110 vcore:1600] resource on local-
> node"}
> > 2021-02-05T20:25:34.479Z DEBUG scheduler/scheduling_node.go:271
> requested
> > resource is larger than currently available node resources {"nodeID":
> > "local-node", "requested": "map[memory:1074 vcore:2000]", "available":
> > "map[ephemeral-storage:50977832921 hugepages-2Mi:0 memory:13250
> pods:110
> > vcore:1600]"}
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >

Reply via email to