All,

I was discussing this a bit more with Weiwei offline, and I think we are better 
off withdrawing 0.12.2 RC2 and releasing 0.12.2 RC3 instead. I will get a build 
together shortly and issue a new vote.

Thanks,

Craig


> On Jan 21, 2022, at 9:39 AM, Craig Condit <[email protected]> wrote:
> 
> Chaoran, nice catch on this one. Unfortunate that we didn’t find it before 
> cutting 0.12.2.
> 
> I agree with Wilfred that we can add to the release notes on the website, but 
> that we should back port to 0.12.3 as well. I can RM that release as well, 
> unless someone else wants to volunteer.
> 
> - Craig
> 
> 
> 
>> On Jan 21, 2022, at 12:44 AM, Wilfred Spiegelenburg <[email protected]> 
>> wrote:
>> 
>> We have seen large numbers of people running and deploying. I have
>> opened a PR with the fix.
>> The scheduler should not get deleted, unless scaled down on purpose.
>> It should not get evicted either, it should run as a high priority pod
>> unless we missed that.
>> Crashing of the scheduler is a bug,
>> 
>> We should let v0.12.2 go through as normal. In the release
>> announcement we should have a section that points to known issues and
>> we can reference the jira there with the workaround.
>> 
>> The workaround is as simple as a scale down and scale up. As long as
>> the admission controller is running all pods will be pushed towards
>> the YuniKorn scheduler. We can start on a next release on the branch
>> v0.12. We should get this case as part of our e2e tests added.
>> 
>> Wilfred
>> 
>> On Fri, 21 Jan 2022 at 17:15, Weiwei Yang <[email protected]> wrote:
>>> 
>>> Agree, this needs to be fixed.
>>> Likely we need to revoke 0.12.2 and get out a 0.12.3.
>>> 
>>> On Thu, Jan 20, 2022 at 9:56 PM Chaoran Yu <[email protected]> wrote:
>>> 
>>>> Yes, Helm install and upgrade both work.
>>>> The failure scenario is as follows:
>>>> 
>>>> 1. Both the admission controller and the scheduler pods are running
>>>> 2. The scheduler pod is restarted for some reason (e.g. deleted, evicted,
>>>> or crashed)
>>>> 3. The new scheduler pod will be stuck in the pending state because it’s
>>>> intercepted by the admission controller (The schedulerName field is
>>>> yunikorn).
>>>> 
>>>> I think this bug is critical because if the scheduler pod fails for any
>>>> reason, someone has to manually redeploy the whole thing.
>>>> 
>>>> 
>>>>> On Jan 20, 2022, at 21:45, Weiwei Yang <[email protected]> wrote:
>>>>> 
>>>>> Hmmm. that is a bug. But during the release verification, I have tried
>>>> the
>>>>> helm install, and that works as expected. I am guessing that is because
>>>> the
>>>>> scheduler always gets started first. Maybe the same for the upgrade? In
>>>>> this case, maybe this can work as long as people are using helm charts to
>>>>> deploy yunikorn? Craig, could you please look into this and let us know
>>>> if
>>>>> we need to revoke the vote for 0.12.2 and have a 0.12.3?
>>>>> 
>>>>> Thank you Chaoran to raise this up. Much appreciated!
>>>>> 
>>>>> On Thu, Jan 20, 2022 at 5:00 PM Chaoran Yu <[email protected]>
>>>> wrote:
>>>>> 
>>>>>> I just spotted a bug
>>>> https://issues.apache.org/jira/browse/YUNIKORN-1038.
>>>>>> which is critical and worth porting back into branch 0.12
>>>>>> 
>>>>>> On Thu, Jan 20, 2022 at 12:12 PM Sunil Govindan <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>>> A late +1 (binding) from me.
>>>>>>> 
>>>>>>> I build this from source
>>>>>>> - Ran basic spark job
>>>>>>> - Verified UI
>>>>>>> - Checked signature.
>>>>>>> - Checked the images.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Sunil
>>>>>>> 
>>>>>>> On Wed, Jan 19, 2022 at 8:44 AM Craig Condit <[email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> The vote to Release Apache YuniKorn (incubating) 0.12.2 RC2 has passed
>>>>>>>> with 3 binding +1 votes and 3 non-binding +1 votes.
>>>>>>>> 
>>>>>>>> Vote thread:
>>>>>>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j <
>>>>>>>> https://lists.apache.org/thread/1gw0k0g5fy86r8ljnjttdco04w7z5j4j>
>>>>>>>> 
>>>>>>>> Thank you to all the members who helped verify this release. We will
>>>>>> move
>>>>>>>> to IPMC voting shortly.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Craig
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>> 
>>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to