Hi Beam devs, I’m *Junaid Shaukat*, a GSoC 2026 contributor for Apache Beam, mentored by *Jan Lukavský* on a Portable Kafka Streams runner (#18479 <https://github.com/apache/beam/issues/18479>)
Per my mentor’s suggestion, I’m re-sharing the design document and proposal here to get broader community feedback and traction. *Design doc:* https://docs.google.com/document/d/1BBMURhSG4SxPcvvnKMTrmnKCr_jhXL6R4TBDBW7zsy8/edit?usp=sharing *Proposal: * https://docs.google.com/document/d/1NbFrw_-krXNM_0t4XFaa6WLIM-xK7IdCdP0PFTXIRi8/edit?usp=sharing *What we’re aiming for (v1 skeleton)*: a Beam portable runner on Kafka Streams using the Processor API—ingestion, fused executable stages via the Fn API, internal repartitions for GBK/Combine, RocksDB-backed state, and correctness-oriented execution aligned with the prototype design. *Feedback I’d especially appreciate* *1. Watermark management *— per-partition progress and how to propagate a safe event-time frontier across stages (including after repartition), especially in the “vector clock / frontier” style we sketch in the doc. *2. Partition + task metadata* — how to treat changing partition assignment / rescaling in Kafka Streams while keeping watermark semantics sound. *3. Anything risky or missing* in translation boundaries, bundles/commits, or ValidatesRunner expectations we should address before the first upstream PRs land. I’d be grateful for any comments from “high-level direction” to “this paragraph is wrong because…”. I’ll incorporate feedback into the doc and post a short summary of decisions back on the GitHub issue. Thanks you for your time. Best, Junaid Shaukat https://github.com/junaiddshaukat
