Hey Wilfred,

Yes, I'm taking the role of release manager.
I cherry-picked YUNIKORN-2520 to branch-1.5.

Regarding the remaining JIRAs, I asked PoAn Yang on Slack to take a look at
YUNIKORN-2057 as he originally volunteered to solve it. I told him that it
was not urgent, but depending on how quickly he makes progress, we might
re-consider our position later.

Peter

On Mon, Apr 29, 2024 at 5:00 AM Wilfred Spiegelenburg <wilfr...@apache.org>
wrote:

> Peter,
>
> Thank you for starting this discussion. See inline for further comments.
>
> > Hi all,
> >
> > Due to the number of problems that we have discovered since the release
> of
> > 1.5.0, I believe it makes sense to create a new Yunikorn release which
> > consists of bug fixes only. If I'm not mistaken we haven't done this
> before
> > (at least since leaving the ASF incubator), so this would be the first
> > minor Yunikorn release.
>
> +1
> I am totally for releasing YuniKorn 1.5.1 with the lock fixes.
> Looking at all the work you have done for this release: would you be
> willing to also step up as a release manager for the 1.5.1 release?
>
> > There are a bunch of fixes that are already on branch-1.5:
> >
> >    - YUNIKORN-2521 Scheduler deadlock (resolved indirectly by
> YUNIKORN-2544)
> >    - YUNIKORN-2539 Add optional deadlock detection
> >    - YUNIKORN-2544 [UMBRELLA] Fix Yunikorn potential locking issues
> >       - YUNIKORN-2543 Fix locking in RMProxy
> >       - YUNIKORN-2545 Eliminate multiple lock calls from Queue
> >       - YUNIKORN-2548 Potential deadlock during concurrent
> >       bottom-up/top-down queue traversal
> >       - YUNIKORN-2550 Fix locking in PartitionContext
> >       - YUNIKORN-2552 Recursive locking when sending remove queue event
> >       - YUNIKORN-2553 [core] Enable deadlock detection during unit tests
> >       - YUNIKORN-2563 [shim] Enable deadlock detection during unit tests
> >       - YUNIKORN-2574 totalPartitionResource should not be mutated with
> >       AddTo/SubFrom
> >       - YUNIKORN-2562 Nil pointer panic in
> Application.ReplaceAllocation()
> >
>
> Yes for all the above.
>
> > The following is In Progress for 1.5.1:
> >
> >    - YUNIKORN-2526 Discrepancy between shim cache and core app/task list
> >    after scheduler restart
>
> This would be a good one to get in if we have some progress on this.
> Do we understand what is going on yet? I looked at the jira and am not
> sure if we understand the root cause.
>
> > Candidates:
> >
> >    - YUNIKORN-2520 PVC errors in AssumePod() are not handled properly -
> >    Resolved, only cherry-picking is needed
>
> Yes, this could be added.
>
> I also think we need to check if we have any CVE fixes that need to be
> added.
> Quick check shows these two:
> * golang.org/x/net 0.23 (CVE-2023-45288 or GO-2024-2687 via YUNIKORN-2541)
> * google.golang.org/protobuf to v1.33.0 (CVE-2024-24786 via YUNIKORN-2469)
> * build with golang 1.21.9
>
> To satisfy the scanners, although we are not affected:
> * K8s 1.29.4 (CVE-2024-3177)
>
>
> >    - YUNIKORN-2057 FindQueueByAppID is slow - Critical priority, "In
> >    progress" since Oct 2023
> >    - YUNIKORN-1089 Application handling with invalid task group
> annotations
> >    - Critical priority, no progress
> >    - YUNIKORN-1988 Preemption happens when a queue lower than its
> >    guaranteed capacity - Critical priority, "In progress" since Sep 2023
>
> No for the last 3 mentioned. We did not block the 1.5.0 release on
> these and they have not made enough progress since then.
> I would not consider them as a possible candidate for 1.5.1
>
> Wilfred
>
> >
> > Thoughts, opinions? What should be the scope of 1.5.1?
> >
> > Thanks,
> > Peter
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> For additional commands, e-mail: dev-h...@yunikorn.apache.org
>
>

Reply via email to