Hi Beam devs,

I’m *Junaid Shaukat*, a GSoC 2026 contributor for Apache Beam, mentored by *Jan
Lukavský* on a Portable Kafka Streams runner (#18479
<https://github.com/apache/beam/issues/18479>)

Per my mentor’s suggestion, I’m re-sharing the design document and proposal
here to get broader community feedback and traction.

*Design doc:*
https://docs.google.com/document/d/1BBMURhSG4SxPcvvnKMTrmnKCr_jhXL6R4TBDBW7zsy8/edit?usp=sharing
*Proposal: *
https://docs.google.com/document/d/1NbFrw_-krXNM_0t4XFaa6WLIM-xK7IdCdP0PFTXIRi8/edit?usp=sharing

*What we’re aiming for (v1 skeleton)*: a Beam portable runner on Kafka
Streams using the Processor API—ingestion, fused executable stages via the
Fn API, internal repartitions for GBK/Combine, RocksDB-backed state, and
correctness-oriented execution aligned with the prototype design.


*Feedback I’d especially appreciate*
*1. Watermark management *— per-partition progress and how to propagate a
safe event-time frontier across stages (including after repartition),
especially in the “vector clock / frontier” style we sketch in the doc.
*2. Partition + task metadata* — how to treat changing partition assignment
/ rescaling in Kafka Streams while keeping watermark semantics sound.
*3. Anything risky or missing* in translation boundaries, bundles/commits,
or ValidatesRunner expectations we should address before the first upstream
PRs land.

I’d be grateful for any comments from “high-level direction” to “this
paragraph is wrong because…”. I’ll incorporate feedback into the doc and
post a short summary of decisions back on the GitHub issue.

Thanks you for your time.

Best,
Junaid Shaukat
https://github.com/junaiddshaukat

Reply via email to