Hi all

I do not think we should get 0.12.2 out as we know we need to fix this bug.
Having 0.12.2 out with a known issue doesn't sound better than just
withdrawing it and re-release 0.12.2, using 0.12.2-RC3.
Can we just withdraw the IPMC vote and start 0.12.2-RC3 right away?

On Fri, Jan 21, 2022 at 7:39 AM Craig Condit <[email protected]> wrote:

> Chaoran, nice catch on this one. Unfortunate that we didn’t find it before
> cutting 0.12.2.
>
> I agree with Wilfred that we can add to the release notes on the website,
> but that we should back port to 0.12.3 as well. I can RM that release as
> well, unless someone else wants to volunteer.
>
> - Craig
>
>
>
> > On Jan 21, 2022, at 12:44 AM, Wilfred Spiegelenburg <[email protected]>
> wrote:
> >
> > We have seen large numbers of people running and deploying. I have
> > opened a PR with the fix.
> > The scheduler should not get deleted, unless scaled down on purpose.
> > It should not get evicted either, it should run as a high priority pod
> > unless we missed that.
> > Crashing of the scheduler is a bug,
> >
> > We should let v0.12.2 go through as normal. In the release
> > announcement we should have a section that points to known issues and
> > we can reference the jira there with the workaround.
> >
> > The workaround is as simple as a scale down and scale up. As long as
> > the admission controller is running all pods will be pushed towards
> > the YuniKorn scheduler. We can start on a next release on the branch
> > v0.12. We should get this case as part of our e2e tests added.
> >
> > Wilfred
> >
> > On Fri, 21 Jan 2022 at 17:15, Weiwei Yang <[email protected]> wrote:
> >>
> >> Agree, this needs to be fixed.
> >> Likely we need to revoke 0.12.2 and get out a 0.12.3.
> >>
> >> On Thu, Jan 20, 2022 at 9:56 PM Chaoran Yu <[email protected]>
> wrote:
> >>
> >>> Yes, Helm install and upgrade both work.
> >>> The failure scenario is as follows:
> >>>
> >>> 1. Both the admission controller and the scheduler pods are running
> >>> 2. The scheduler pod is restarted for some reason (e.g. deleted,
> evicted,
> >>> or crashed)
> >>> 3. The new scheduler pod will be stuck in the pending state because
> it’s
> >>> intercepted by the admission controller (The schedulerName field is
> >>> yunikorn).
> >>>
> >>> I think this bug is critical because if the scheduler pod fails for any
> >>> reason, someone has to manually redeploy the whole thing.
> >>>
> >>>
> >>>> On Jan 20, 2022, at 21:45, Weiwei Yang <[email protected]> wrote:
> >>>>
> >>>> Hmmm. that is a bug. But during the release verification, I have tried
> >>> the
> >>>> helm install, and that works as expected. I am guessing that is
> because
> >>> the
> >>>> scheduler always gets started first. Maybe the same for the upgrade?
> In
> >>>> this case, maybe this can work as long as people are using helm
> charts to
> >>>> deploy yunikorn? Craig, could you please look into this and let us
> know
> >>> if
> >>>> we need to revoke the vote for 0.12.2 and have a 0.12.3?
> >>>>
> >>>> Thank you Chaoran to raise this up. Much appreciated!
> >>>>
> >>>> On Thu, Jan 20, 2022 at 5:00 PM Chaoran Yu <[email protected]>
> >>> wrote:
> >>>>
> >>>>> I just spotted a bug
> >>> https://issues.apache.org/jira/browse/YUNIKORN-1038.
> >>>>> which is critical and worth porting back into branch 0.12
> >>>>>
> >>>>> On Thu, Jan 20, 2022 at 12:12 PM Sunil Govindan <[email protected]>
> >>> wrote:
> >>>>>
> >>>>>> A late +1 (binding) from me.
> >>>>>>
> >>>>>> I build this from source
> >>>>>> - Ran basic spark job
> >>>>>> - Verified UI
> >>>>>> - Checked signature.
> >>>>>> - Checked the images.
> >>>>>>
> >>>>>> Thanks
> >>>>>> Sunil
> >>>>>>
> >>>>>> On Wed, Jan 19, 2022 at 8:44 AM Craig Condit <
> [email protected]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> The vote to Release Apache YuniKorn (incubating) 0.12.2 RC2 has
> passed
> >>>>>>> with 3 binding +1 votes and 3 non-binding +1 votes.
> >>>>>>>
> >>>>>>> Vote thread:
> >>>>>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j <
> >>>>>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j>
> >>>>>>>
> >>>>>>> Thank you to all the members who helped verify this release. We
> will
> >>>>> move
> >>>>>>> to IPMC voting shortly.
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Craig
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [email protected]
> >>> For additional commands, e-mail: [email protected]
> >>>
> >>>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to