Thank you for the design document, Junaid.

We would like to ask anyone interested to comment on the described design so that we can possibly catch any rough edges before implementation starts.

Thanks!
 Jan

On 5/4/26 10:26, Junaid wrote:
Hi Beam devs,

I’m *Junaid Shaukat*, a GSoC 2026 contributor for Apache Beam, mentored by *Jan Lukavský* on a Portable Kafka Streams runner (#18479 <https://github.com/apache/beam/issues/18479>)

Per my mentor’s suggestion, I’m re-sharing the design document and proposal here to get broader community feedback and traction.

*Design doc:* https://docs.google.com/document/d/1BBMURhSG4SxPcvvnKMTrmnKCr_jhXL6R4TBDBW7zsy8/edit?usp=sharing *Proposal: *https://docs.google.com/document/d/1NbFrw_-krXNM_0t4XFaa6WLIM-xK7IdCdP0PFTXIRi8/edit?usp=sharing

*What we’re aiming for (v1 skeleton)*: a Beam portable runner on Kafka Streams using the Processor API—ingestion, fused executable stages via the Fn API, internal repartitions for GBK/Combine, RocksDB-backed state, and correctness-oriented execution aligned with the prototype design.

*Feedback I’d especially appreciate
*
*1. Watermark management *— per-partition progress and how to propagate a safe event-time frontier across stages (including after repartition), especially in the “vector clock / frontier” style we sketch in the doc. *2. Partition + task metadata* — how to treat changing partition assignment / rescaling in Kafka Streams while keeping watermark semantics sound. *3. Anything risky or missing* in translation boundaries, bundles/commits, or ValidatesRunner expectations we should address before the first upstream PRs land.

I’d be grateful for any comments from “high-level direction” to “this paragraph is wrong because…”. I’ll incorporate feedback into the doc and post a short summary of decisions back on the GitHub issue.

Thanks you for your time.

Best,
Junaid Shaukat
https://github.com/junaiddshaukat

Reply via email to