Hello everyone,

My name is Mathijs. I recently contributed PR #37379 where I fixed the E2BIG OS 
argument limit in the Python SDK worker boot sequence and am currently 
preparing a systems focused proposal for GSoC 2026.

While looking at Prism and reviewing the internal scheduling code, I noticed 
the TODO(lostluck) in 'minPendingTimestampLocked':
 " Can we figure out how to avoid checking every key on every watermark 
refresh?"

Currently, this performs an O(n) linear scan over pendingByKeys inside a locked 
critical section on every watermark tick. I would like to propose a GSoC 
project to systematically resolve this and other stateful scheduling 
bottlenecks in Prism.

The core of my proposal would focus on three phases:
    
1 - The Watermark Bottleneck: Replacing the O(n) linear scan with a key-indexed 
min-heap to reduce the refresh cost to O(log n), keeping the critical section 
as short as possible.

2 - Stateful Parallelism: Addressing the TODO in 'buildEventTimeBundle' by 
implementing configurable limits for keys per bundle and elements per key, 
optimizing how Prism schedules high concurrent stateful stages.

3 - Regression Infrastructure: Building a permanent benchmarking suite for 
Prism's scheduling engine to formally characterize these improvements and 
potentially catch future O(n) regressions in the hot path.

I recognize that you may have already prototyped a heap-based approach or have 
a different data structure in mind for this. If so, I'd rather build on your 
thinking than duplicate work. Are these scheduling optimizations currently an 
active priority for the core team? If so, and if a maintainer has the time and 
interest to mentor it, I would love to formally draft this out as my GSoC 
proposal.

Also, I would love to join the ASF Slack workspace to connect with the 
community, could someone please send an invite link to this email address?

Thank you for your time,
Mathijs Deelen

Github: mathdee

Reply via email to