Hi Mathijs,

Thank you for your interest. I saw you are looking for potential mentors
for the proposed project. FYI we also have some ideas submitted to Apache:
https://cwiki.apache.org/confluence/display/COMDEV/GSoC+2026+Ideas+list#GSoC2026Ideaslist-Beam
if it interests you.

Thanks,

Yi


On Mon, Feb 23, 2026 at 11:32 PM Mathijs Deelen <[email protected]>
wrote:

> Hello everyone,
>
> My name is Mathijs. I recently contributed PR #37379 where I fixed the
> E2BIG OS argument limit in the Python SDK worker boot sequence and am
> currently preparing a systems focused proposal for GSoC 2026.
>
> While looking at Prism and reviewing the internal scheduling code, I
> noticed the TODO(lostluck) in 'minPendingTimestampLocked':
>  " Can we figure out how to avoid checking every key on every watermark
> refresh?"
>
> Currently, this performs an O(n) linear scan over pendingByKeys inside a
> locked critical section on every watermark tick. I would like to propose a
> GSoC project to systematically resolve this and other stateful scheduling
> bottlenecks in Prism.
>
> The core of my proposal would focus on three phases:
> 1 - The Watermark Bottleneck: Replacing the O(n) linear scan with a
> key-indexed min-heap to reduce the refresh cost to O(log n), keeping the
> critical section as short as possible.
>
> 2 - Stateful Parallelism: Addressing the TODO in 'buildEventTimeBundle' by
> implementing configurable limits for keys per bundle and elements per key,
> optimizing how Prism schedules high concurrent stateful stages.
>
> 3 - Regression Infrastructure: Building a permanent benchmarking suite for
> Prism's scheduling engine to formally characterize these improvements and
> potentially catch future O(n) regressions in the hot path.
>
> I recognize that you may have already prototyped a heap-based approach or
> have a different data structure in mind for this. If so, I'd rather build
> on your thinking than duplicate work. Are these scheduling optimizations
> currently an active priority for the core team? If so, and if a maintainer
> has the time and interest to mentor it, I would love to formally draft this
> out as my GSoC proposal.
>
> Also, I would love to join the ASF Slack workspace to connect with the
> community, could someone please send an invite link to this email address?
>
> Thank you for your time,
> Mathijs Deelen
>
> Github: mathdee
>
>

Reply via email to