All, I was discussing this a bit more with Weiwei offline, and I think we are better off withdrawing 0.12.2 RC2 and releasing 0.12.2 RC3 instead. I will get a build together shortly and issue a new vote.
Thanks, Craig > On Jan 21, 2022, at 9:39 AM, Craig Condit <[email protected]> wrote: > > Chaoran, nice catch on this one. Unfortunate that we didn’t find it before > cutting 0.12.2. > > I agree with Wilfred that we can add to the release notes on the website, but > that we should back port to 0.12.3 as well. I can RM that release as well, > unless someone else wants to volunteer. > > - Craig > > > >> On Jan 21, 2022, at 12:44 AM, Wilfred Spiegelenburg <[email protected]> >> wrote: >> >> We have seen large numbers of people running and deploying. I have >> opened a PR with the fix. >> The scheduler should not get deleted, unless scaled down on purpose. >> It should not get evicted either, it should run as a high priority pod >> unless we missed that. >> Crashing of the scheduler is a bug, >> >> We should let v0.12.2 go through as normal. In the release >> announcement we should have a section that points to known issues and >> we can reference the jira there with the workaround. >> >> The workaround is as simple as a scale down and scale up. As long as >> the admission controller is running all pods will be pushed towards >> the YuniKorn scheduler. We can start on a next release on the branch >> v0.12. We should get this case as part of our e2e tests added. >> >> Wilfred >> >> On Fri, 21 Jan 2022 at 17:15, Weiwei Yang <[email protected]> wrote: >>> >>> Agree, this needs to be fixed. >>> Likely we need to revoke 0.12.2 and get out a 0.12.3. >>> >>> On Thu, Jan 20, 2022 at 9:56 PM Chaoran Yu <[email protected]> wrote: >>> >>>> Yes, Helm install and upgrade both work. >>>> The failure scenario is as follows: >>>> >>>> 1. Both the admission controller and the scheduler pods are running >>>> 2. The scheduler pod is restarted for some reason (e.g. deleted, evicted, >>>> or crashed) >>>> 3. The new scheduler pod will be stuck in the pending state because it’s >>>> intercepted by the admission controller (The schedulerName field is >>>> yunikorn). >>>> >>>> I think this bug is critical because if the scheduler pod fails for any >>>> reason, someone has to manually redeploy the whole thing. >>>> >>>> >>>>> On Jan 20, 2022, at 21:45, Weiwei Yang <[email protected]> wrote: >>>>> >>>>> Hmmm. that is a bug. But during the release verification, I have tried >>>> the >>>>> helm install, and that works as expected. I am guessing that is because >>>> the >>>>> scheduler always gets started first. Maybe the same for the upgrade? In >>>>> this case, maybe this can work as long as people are using helm charts to >>>>> deploy yunikorn? Craig, could you please look into this and let us know >>>> if >>>>> we need to revoke the vote for 0.12.2 and have a 0.12.3? >>>>> >>>>> Thank you Chaoran to raise this up. Much appreciated! >>>>> >>>>> On Thu, Jan 20, 2022 at 5:00 PM Chaoran Yu <[email protected]> >>>> wrote: >>>>> >>>>>> I just spotted a bug >>>> https://issues.apache.org/jira/browse/YUNIKORN-1038. >>>>>> which is critical and worth porting back into branch 0.12 >>>>>> >>>>>> On Thu, Jan 20, 2022 at 12:12 PM Sunil Govindan <[email protected]> >>>> wrote: >>>>>> >>>>>>> A late +1 (binding) from me. >>>>>>> >>>>>>> I build this from source >>>>>>> - Ran basic spark job >>>>>>> - Verified UI >>>>>>> - Checked signature. >>>>>>> - Checked the images. >>>>>>> >>>>>>> Thanks >>>>>>> Sunil >>>>>>> >>>>>>> On Wed, Jan 19, 2022 at 8:44 AM Craig Condit <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> The vote to Release Apache YuniKorn (incubating) 0.12.2 RC2 has passed >>>>>>>> with 3 binding +1 votes and 3 non-binding +1 votes. >>>>>>>> >>>>>>>> Vote thread: >>>>>>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j < >>>>>>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j> >>>>>>>> >>>>>>>> Thank you to all the members who helped verify this release. We will >>>>>> move >>>>>>>> to IPMC voting shortly. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Craig >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
