Peter,

Thank you for starting this discussion. See inline for further comments.

> Hi all,
>
> Due to the number of problems that we have discovered since the release of
> 1.5.0, I believe it makes sense to create a new Yunikorn release which
> consists of bug fixes only. If I'm not mistaken we haven't done this before
> (at least since leaving the ASF incubator), so this would be the first
> minor Yunikorn release.

+1
I am totally for releasing YuniKorn 1.5.1 with the lock fixes.
Looking at all the work you have done for this release: would you be
willing to also step up as a release manager for the 1.5.1 release?

> There are a bunch of fixes that are already on branch-1.5:
>
>    - YUNIKORN-2521 Scheduler deadlock (resolved indirectly by YUNIKORN-2544)
>    - YUNIKORN-2539 Add optional deadlock detection
>    - YUNIKORN-2544 [UMBRELLA] Fix Yunikorn potential locking issues
>       - YUNIKORN-2543 Fix locking in RMProxy
>       - YUNIKORN-2545 Eliminate multiple lock calls from Queue
>       - YUNIKORN-2548 Potential deadlock during concurrent
>       bottom-up/top-down queue traversal
>       - YUNIKORN-2550 Fix locking in PartitionContext
>       - YUNIKORN-2552 Recursive locking when sending remove queue event
>       - YUNIKORN-2553 [core] Enable deadlock detection during unit tests
>       - YUNIKORN-2563 [shim] Enable deadlock detection during unit tests
>       - YUNIKORN-2574 totalPartitionResource should not be mutated with
>       AddTo/SubFrom
>       - YUNIKORN-2562 Nil pointer panic in Application.ReplaceAllocation()
>

Yes for all the above.

> The following is In Progress for 1.5.1:
>
>    - YUNIKORN-2526 Discrepancy between shim cache and core app/task list
>    after scheduler restart

This would be a good one to get in if we have some progress on this.
Do we understand what is going on yet? I looked at the jira and am not
sure if we understand the root cause.

> Candidates:
>
>    - YUNIKORN-2520 PVC errors in AssumePod() are not handled properly -
>    Resolved, only cherry-picking is needed

Yes, this could be added.

I also think we need to check if we have any CVE fixes that need to be added.
Quick check shows these two:
* golang.org/x/net 0.23 (CVE-2023-45288 or GO-2024-2687 via YUNIKORN-2541)
* google.golang.org/protobuf to v1.33.0 (CVE-2024-24786 via YUNIKORN-2469)
* build with golang 1.21.9

To satisfy the scanners, although we are not affected:
* K8s 1.29.4 (CVE-2024-3177)


>    - YUNIKORN-2057 FindQueueByAppID is slow - Critical priority, "In
>    progress" since Oct 2023
>    - YUNIKORN-1089 Application handling with invalid task group annotations
>    - Critical priority, no progress
>    - YUNIKORN-1988 Preemption happens when a queue lower than its
>    guaranteed capacity - Critical priority, "In progress" since Sep 2023

No for the last 3 mentioned. We did not block the 1.5.0 release on
these and they have not made enough progress since then.
I would not consider them as a possible candidate for 1.5.1

Wilfred

>
> Thoughts, opinions? What should be the scope of 1.5.1?
>
> Thanks,
> Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org

Reply via email to