Thank you for the design document, Junaid.
We would like to ask anyone interested to comment on the described
design so that we can possibly catch any rough edges before
implementation starts.
Thanks!
Jan
On 5/4/26 10:26, Junaid wrote:
Hi Beam devs,
I’m *Junaid Shaukat*, a GSoC 2026 contributor for Apache Beam,
mentored by *Jan Lukavský* on a Portable Kafka Streams runner (#18479
<https://github.com/apache/beam/issues/18479>)
Per my mentor’s suggestion, I’m re-sharing the design document and
proposal here to get broader community feedback and traction.
*Design doc:*
https://docs.google.com/document/d/1BBMURhSG4SxPcvvnKMTrmnKCr_jhXL6R4TBDBW7zsy8/edit?usp=sharing
*Proposal:
*https://docs.google.com/document/d/1NbFrw_-krXNM_0t4XFaa6WLIM-xK7IdCdP0PFTXIRi8/edit?usp=sharing
*What we’re aiming for (v1 skeleton)*: a Beam portable runner on Kafka
Streams using the Processor API—ingestion, fused executable stages via
the Fn API, internal repartitions for GBK/Combine, RocksDB-backed
state, and correctness-oriented execution aligned with the prototype
design.
*Feedback I’d especially appreciate
*
*1. Watermark management *— per-partition progress and how to
propagate a safe event-time frontier across stages (including after
repartition), especially in the “vector clock / frontier” style we
sketch in the doc.
*2. Partition + task metadata* — how to treat changing partition
assignment / rescaling in Kafka Streams while keeping watermark
semantics sound.
*3. Anything risky or missing* in translation boundaries,
bundles/commits, or ValidatesRunner expectations we should address
before the first upstream PRs land.
I’d be grateful for any comments from “high-level direction” to “this
paragraph is wrong because…”. I’ll incorporate feedback into the doc
and post a short summary of decisions back on the GitHub issue.
Thanks you for your time.
Best,
Junaid Shaukat
https://github.com/junaiddshaukat