Hello everyone, My name is Mathijs. I recently contributed PR #37379 where I fixed the E2BIG OS argument limit in the Python SDK worker boot sequence and am currently preparing a systems focused proposal for GSoC 2026.
While looking at Prism and reviewing the internal scheduling code, I noticed the TODO(lostluck) in 'minPendingTimestampLocked': " Can we figure out how to avoid checking every key on every watermark refresh?" Currently, this performs an O(n) linear scan over pendingByKeys inside a locked critical section on every watermark tick. I would like to propose a GSoC project to systematically resolve this and other stateful scheduling bottlenecks in Prism. The core of my proposal would focus on three phases: 1 - The Watermark Bottleneck: Replacing the O(n) linear scan with a key-indexed min-heap to reduce the refresh cost to O(log n), keeping the critical section as short as possible. 2 - Stateful Parallelism: Addressing the TODO in 'buildEventTimeBundle' by implementing configurable limits for keys per bundle and elements per key, optimizing how Prism schedules high concurrent stateful stages. 3 - Regression Infrastructure: Building a permanent benchmarking suite for Prism's scheduling engine to formally characterize these improvements and potentially catch future O(n) regressions in the hot path. I recognize that you may have already prototyped a heap-based approach or have a different data structure in mind for this. If so, I'd rather build on your thinking than duplicate work. Are these scheduling optimizations currently an active priority for the core team? If so, and if a maintainer has the time and interest to mentor it, I would love to formally draft this out as my GSoC proposal. Also, I would love to join the ASF Slack workspace to connect with the community, could someone please send an invite link to this email address? Thank you for your time, Mathijs Deelen Github: mathdee
