Hi Mathijs, Thank you for your interest. I saw you are looking for potential mentors for the proposed project. FYI we also have some ideas submitted to Apache: https://cwiki.apache.org/confluence/display/COMDEV/GSoC+2026+Ideas+list#GSoC2026Ideaslist-Beam if it interests you.
Thanks, Yi On Mon, Feb 23, 2026 at 11:32 PM Mathijs Deelen <[email protected]> wrote: > Hello everyone, > > My name is Mathijs. I recently contributed PR #37379 where I fixed the > E2BIG OS argument limit in the Python SDK worker boot sequence and am > currently preparing a systems focused proposal for GSoC 2026. > > While looking at Prism and reviewing the internal scheduling code, I > noticed the TODO(lostluck) in 'minPendingTimestampLocked': > " Can we figure out how to avoid checking every key on every watermark > refresh?" > > Currently, this performs an O(n) linear scan over pendingByKeys inside a > locked critical section on every watermark tick. I would like to propose a > GSoC project to systematically resolve this and other stateful scheduling > bottlenecks in Prism. > > The core of my proposal would focus on three phases: > 1 - The Watermark Bottleneck: Replacing the O(n) linear scan with a > key-indexed min-heap to reduce the refresh cost to O(log n), keeping the > critical section as short as possible. > > 2 - Stateful Parallelism: Addressing the TODO in 'buildEventTimeBundle' by > implementing configurable limits for keys per bundle and elements per key, > optimizing how Prism schedules high concurrent stateful stages. > > 3 - Regression Infrastructure: Building a permanent benchmarking suite for > Prism's scheduling engine to formally characterize these improvements and > potentially catch future O(n) regressions in the hot path. > > I recognize that you may have already prototyped a heap-based approach or > have a different data structure in mind for this. If so, I'd rather build > on your thinking than duplicate work. Are these scheduling optimizations > currently an active priority for the core team? If so, and if a maintainer > has the time and interest to mentor it, I would love to formally draft this > out as my GSoC proposal. > > Also, I would love to join the ASF Slack workspace to connect with the > community, could someone please send an invite link to this email address? > > Thank you for your time, > Mathijs Deelen > > Github: mathdee > >
