Thanks for your answer. You are right -- this is a single-node cluster, and I was just relying on the limits on the node. I was not setting a queue quota.
You are also right that when I use annotations on the namespace to set queue quotas, it works as expected: A-3 gets scheduled before B-1. While trying this some more, I also noticed that even when I do *not* set a queue quota, the order is as-expected (A-3, then B-1) after restarting Yunikorn. But on subsequent runs, the order is always B-1, then A-3. Is it required to have queue quotas to make this work in the expected way? Or should it also work if resources are only constrained at the node or cluster level? I should probably mention: I am doing this in a cluster created with Rancher 2.5.5 and running k3s v1.18.8+k3s1. I don’t know if that matters or not. > -----Original Message----- > From: Weiwei Yang <[email protected]> > Sent: Friday, February 5, 2021 4:47 PM > To: [email protected] > Subject: Re: Unexpected scheduler behavior with K8s > > *** WARNING *** > EXTERNAL EMAIL -- This message originates from outside our organization. > > > Hi Patrick > > Thanks for reaching out. > I have one question about how you limit there is only 2 pod running at a > time. Are you running this on a single node cluster, and limit that at the > node resource level? > Based on the use case you described. My guess is the scheduler creates > Reservations for B-1 and A-3, and when we unreserve these reservations, the > FIFO ordering was not strictly honored. > Have you set any queue quota? Here is the doc about how to set the queue > mapping with quota set: > http://yunikorn.apache.org/docs/next/user_guide/resource_quota_manageme > nt#namespace-to-queue-mapping. > If we limit the running pods by quota, then I think we will see expected > behavior. > The document about app sorting policy is here: > http://yunikorn.apache.org/docs/next/user_guide/sorting_policies#application- > sorting > . > > Weiwei > > On Fri, Feb 5, 2021 at 1:04 PM Patrick, Alton (US) > <[email protected]> wrote: > > > Hi. I have just started using Yunikorn for scheduling with K8s. I have > > been doing some simple experiments to make sure I understand how it works. > > Most of them are working as expected, but there is one I don't understand. > > > > I used Helm to deploy Yunikorn and I did not modify anything, so I have > > the default setup: one queue per namespace, and I assume the application > > sort policy is the default of "FifoSortPolicy." > > > > I create four pods. All of them have the resource requests set the same (2 > > cores, 1Gi mem), and the resource requests are such that only two can run > > at a time. The pods are created in this order, with a 1 second gap between > > each: > > > > 1. A-1, applicationId = A, sleeps for 10s > > 2. A-2, applicationId = A, sleeps for 5s > > 3. B-1, applicationId = B, sleeps for 5s > > 4. A-3, applicationId = A, sleeps for 5s > > > > What I expect to see is: > > * A-1 is scheduled > > * A-2 is scheduled > > * A-2 finishes > > * A-3 is scheduled (because A is the first application created, as long as > > there are pods in the queue for application A I understand that they should > > have priority over pods for application B) > > > > What I see instead is that after A-2 finishes, B-1 gets scheduled to run. > > > > Is this the expected behavior, and if so can someone explain what is wrong > > with my understanding? > > > > Additionally, in the logs for the scheduler pod, right after pod B-1 gets > > scheduled, I see the following messages repeated thousands of times very > > fast (over 2000 instances in about .25s according to timestamps in log). Is > > this normal? > > > > 2021-02-05T20:25:34.479Z DEBUG scheduler/scheduling_application.go:641 > > skipping node for allocation: basic condition not satisfied {"node": > > "local-node", "allocationKey": "258d9947-e92b-4967-9758-08eee62f4d1b", > > "error": "pre alloc check: requested resource map[memory:1074 vcore:2000] > > is larger than currently available map[ephemeral-storage:50977832921 > > hugepages-2Mi:0 memory:13250 pods:110 vcore:1600] resource on local- > node"} > > 2021-02-05T20:25:34.479Z DEBUG scheduler/scheduling_node.go:271 > requested > > resource is larger than currently available node resources {"nodeID": > > "local-node", "requested": "map[memory:1074 vcore:2000]", "available": > > "map[ephemeral-storage:50977832921 hugepages-2Mi:0 memory:13250 > pods:110 > > vcore:1600]"} > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > >
