Hi Patrick Sorry for the confusion. What I meant to say was the following case: 1. submit job A 2. submit job B 3. Both A and B are running to finish (all pods terminated and deleted) 4. After sometime 5. Submit job B 6. Submit job A
At this point you are expecting to see job B gets scheduled first, however, since inside of the scheduler, YK still thinks A and B are running (due to the limitation I mentioned in the previous comment), it still schedules A first then B second. But if you restart the scheduler, I think you'll see expected behavior, because the scheduler sees B first then A. On Tue, Feb 9, 2021 at 9:02 AM Patrick, Alton (US) <[email protected]> wrote: > I'm sorry, I don't understand this. The behavior you are describing ("The > FIFO order honors the old app “A” order") is the behavior I was expecting, > but not what I'm seeing. I would think that means that any pod with appID A > should always be scheduled before a pod with appID B, since application A > was created first. But I'm seeing that sometimes pods with appID A get > scheduled after pods with appID B. > > > -----Original Message----- > > From: Weiwei Yang <[email protected]> > > Sent: Saturday, February 6, 2021 7:36 PM > > To: [email protected]; [email protected] > > Subject: RE: Unexpected scheduler behavior with K8s > > > > Hi Patrick > > > > I assume you are using YuniKorn 0.9 build, we have a limitation in this > version. > > When the job finishes, inside of the scheduler, it was not considered as > > completed accordingly. This is due to in K8s we have no where to track > app > > state, YuniKorn tracks it purely on grouping pods by applicationID > label. That > > means, if you submit pods with appID “A”, pods finished and after > sometime > > you submit the pods with ID “A” again, the scheduler still considers it > as the old > > app “A”. The FIFO order honors the old app “A” order. This is probably > the > > reason why you see a different result after restart. > > > > On the upcoming release, we are trying to fix this > > in https://issues.apache.org/jira/browse/YUNIKORN-483. We haven’t tried > K3s > > before, but personally I don’t think that matters. > > > > Hope this helps > > Thanks > > > > -- > > Weiwei > > On Feb 5, 2021, 6:56 PM -0800, Patrick, Alton (US) > > <[email protected]>, wrote: > > > Thanks for your answer. You are right -- this is a single-node > cluster, and I was > > just relying on the limits on the node. I was not setting a queue quota. > > > > > > You are also right that when I use annotations on the namespace to set > queue > > quotas, it works as expected: A-3 gets scheduled before B-1. > > > > > > While trying this some more, I also noticed that even when I do *not* > set a > > queue quota, the order is as-expected (A-3, then B-1) after restarting > Yunikorn. > > But on subsequent runs, the order is always B-1, then A-3. > > > > > > Is it required to have queue quotas to make this work in the expected > way? Or > > should it also work if resources are only constrained at the node or > cluster level? > > > > > > I should probably mention: I am doing this in a cluster created with > Rancher > > 2.5.5 and running k3s v1.18.8+k3s1. I don’t know if that matters or not. > > > > > > > -----Original Message----- > > > > From: Weiwei Yang <[email protected]> > > > > Sent: Friday, February 5, 2021 4:47 PM > > > > To: [email protected] > > > > Subject: Re: Unexpected scheduler behavior with K8s > > > > > > > > *** WARNING *** > > > > EXTERNAL EMAIL -- This message originates from outside our > organization. > > > > > > > > > > > > Hi Patrick > > > > > > > > Thanks for reaching out. > > > > I have one question about how you limit there is only 2 pod running > at a > > > > time. Are you running this on a single node cluster, and limit that > at the > > > > node resource level? > > > > Based on the use case you described. My guess is the scheduler > creates > > > > Reservations for B-1 and A-3, and when we unreserve these > reservations, the > > > > FIFO ordering was not strictly honored. > > > > Have you set any queue quota? Here is the doc about how to set the > queue > > > > mapping with quota set: > > > > > > http://yunikorn.apache.org/docs/next/user_guide/resource_quota_manageme > > > > nt#namespace-to-queue-mapping. > > > > If we limit the running pods by quota, then I think we will see > expected > > > > behavior. > > > > The document about app sorting policy is here: > > > > > > > http://yunikorn.apache.org/docs/next/user_guide/sorting_policies#application- > > > > sorting > > > > . > > > > > > > > Weiwei > > > > > > > > On Fri, Feb 5, 2021 at 1:04 PM Patrick, Alton (US) > > > > <[email protected]> wrote: > > > > > > > > > Hi. I have just started using Yunikorn for scheduling with K8s. I > have > > > > > been doing some simple experiments to make sure I understand how it > > works. > > > > > Most of them are working as expected, but there is one I don't > understand. > > > > > > > > > > I used Helm to deploy Yunikorn and I did not modify anything, so I > have > > > > > the default setup: one queue per namespace, and I assume the > application > > > > > sort policy is the default of "FifoSortPolicy." > > > > > > > > > > I create four pods. All of them have the resource requests set the > same (2 > > > > > cores, 1Gi mem), and the resource requests are such that only two > can run > > > > > at a time. The pods are created in this order, with a 1 second gap > between > > > > > each: > > > > > > > > > > 1. A-1, applicationId = A, sleeps for 10s > > > > > 2. A-2, applicationId = A, sleeps for 5s > > > > > 3. B-1, applicationId = B, sleeps for 5s > > > > > 4. A-3, applicationId = A, sleeps for 5s > > > > > > > > > > What I expect to see is: > > > > > * A-1 is scheduled > > > > > * A-2 is scheduled > > > > > * A-2 finishes > > > > > * A-3 is scheduled (because A is the first application created, as > long as > > > > > there are pods in the queue for application A I understand that > they should > > > > > have priority over pods for application B) > > > > > > > > > > What I see instead is that after A-2 finishes, B-1 gets scheduled > to run. > > > > > > > > > > Is this the expected behavior, and if so can someone explain what > is wrong > > > > > with my understanding? > > > > > > > > > > Additionally, in the logs for the scheduler pod, right after pod > B-1 gets > > > > > scheduled, I see the following messages repeated thousands of > times very > > > > > fast (over 2000 instances in about .25s according to timestamps in > log). Is > > > > > this normal? > > > > > > > > > > 2021-02-05T20:25:34.479Z DEBUG > > scheduler/scheduling_application.go:641 > > > > > skipping node for allocation: basic condition not satisfied > {"node": > > > > > "local-node", "allocationKey": "258d9947-e92b-4967-9758- > > 08eee62f4d1b", > > > > > "error": "pre alloc check: requested resource map[memory:1074 > > vcore:2000] > > > > > is larger than currently available > map[ephemeral-storage:50977832921 > > > > > hugepages-2Mi:0 memory:13250 pods:110 vcore:1600] resource on > local- > > > > node"} > > > > > 2021-02-05T20:25:34.479Z DEBUG scheduler/scheduling_node.go:271 > > > > requested > > > > > resource is larger than currently available node resources > {"nodeID": > > > > > "local-node", "requested": "map[memory:1074 vcore:2000]", > "available": > > > > > "map[ephemeral-storage:50977832921 hugepages-2Mi:0 memory:13250 > > > > pods:110 > > > > > vcore:1600]"} > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: [email protected] > > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] >
