Hi all I do not think we should get 0.12.2 out as we know we need to fix this bug. Having 0.12.2 out with a known issue doesn't sound better than just withdrawing it and re-release 0.12.2, using 0.12.2-RC3. Can we just withdraw the IPMC vote and start 0.12.2-RC3 right away?
On Fri, Jan 21, 2022 at 7:39 AM Craig Condit <[email protected]> wrote: > Chaoran, nice catch on this one. Unfortunate that we didn’t find it before > cutting 0.12.2. > > I agree with Wilfred that we can add to the release notes on the website, > but that we should back port to 0.12.3 as well. I can RM that release as > well, unless someone else wants to volunteer. > > - Craig > > > > > On Jan 21, 2022, at 12:44 AM, Wilfred Spiegelenburg <[email protected]> > wrote: > > > > We have seen large numbers of people running and deploying. I have > > opened a PR with the fix. > > The scheduler should not get deleted, unless scaled down on purpose. > > It should not get evicted either, it should run as a high priority pod > > unless we missed that. > > Crashing of the scheduler is a bug, > > > > We should let v0.12.2 go through as normal. In the release > > announcement we should have a section that points to known issues and > > we can reference the jira there with the workaround. > > > > The workaround is as simple as a scale down and scale up. As long as > > the admission controller is running all pods will be pushed towards > > the YuniKorn scheduler. We can start on a next release on the branch > > v0.12. We should get this case as part of our e2e tests added. > > > > Wilfred > > > > On Fri, 21 Jan 2022 at 17:15, Weiwei Yang <[email protected]> wrote: > >> > >> Agree, this needs to be fixed. > >> Likely we need to revoke 0.12.2 and get out a 0.12.3. > >> > >> On Thu, Jan 20, 2022 at 9:56 PM Chaoran Yu <[email protected]> > wrote: > >> > >>> Yes, Helm install and upgrade both work. > >>> The failure scenario is as follows: > >>> > >>> 1. Both the admission controller and the scheduler pods are running > >>> 2. The scheduler pod is restarted for some reason (e.g. deleted, > evicted, > >>> or crashed) > >>> 3. The new scheduler pod will be stuck in the pending state because > it’s > >>> intercepted by the admission controller (The schedulerName field is > >>> yunikorn). > >>> > >>> I think this bug is critical because if the scheduler pod fails for any > >>> reason, someone has to manually redeploy the whole thing. > >>> > >>> > >>>> On Jan 20, 2022, at 21:45, Weiwei Yang <[email protected]> wrote: > >>>> > >>>> Hmmm. that is a bug. But during the release verification, I have tried > >>> the > >>>> helm install, and that works as expected. I am guessing that is > because > >>> the > >>>> scheduler always gets started first. Maybe the same for the upgrade? > In > >>>> this case, maybe this can work as long as people are using helm > charts to > >>>> deploy yunikorn? Craig, could you please look into this and let us > know > >>> if > >>>> we need to revoke the vote for 0.12.2 and have a 0.12.3? > >>>> > >>>> Thank you Chaoran to raise this up. Much appreciated! > >>>> > >>>> On Thu, Jan 20, 2022 at 5:00 PM Chaoran Yu <[email protected]> > >>> wrote: > >>>> > >>>>> I just spotted a bug > >>> https://issues.apache.org/jira/browse/YUNIKORN-1038. > >>>>> which is critical and worth porting back into branch 0.12 > >>>>> > >>>>> On Thu, Jan 20, 2022 at 12:12 PM Sunil Govindan <[email protected]> > >>> wrote: > >>>>> > >>>>>> A late +1 (binding) from me. > >>>>>> > >>>>>> I build this from source > >>>>>> - Ran basic spark job > >>>>>> - Verified UI > >>>>>> - Checked signature. > >>>>>> - Checked the images. > >>>>>> > >>>>>> Thanks > >>>>>> Sunil > >>>>>> > >>>>>> On Wed, Jan 19, 2022 at 8:44 AM Craig Condit < > [email protected]> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> The vote to Release Apache YuniKorn (incubating) 0.12.2 RC2 has > passed > >>>>>>> with 3 binding +1 votes and 3 non-binding +1 votes. > >>>>>>> > >>>>>>> Vote thread: > >>>>>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j < > >>>>>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j> > >>>>>>> > >>>>>>> Thank you to all the members who helped verify this release. We > will > >>>>> move > >>>>>>> to IPMC voting shortly. > >>>>>>> > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Craig > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: [email protected] > >>> For additional commands, e-mail: [email protected] > >>> > >>> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
