Hi everyone,

I'm working on a design for a portable Kafka Streams Runner for Apache
Beam as part of GSoC 2026 (tracking issue:
https://github.com/apache/beam/issues/18479).
Jan Lukavský has been mentoring the design process and we've been
iterating on the document over the past few weeks.

The design targets the Fn API for full SDK support and covers:

- Pipeline translation from proto to Kafka Streams Topology
  (following the Flink portable runner pattern)
- Transform mappings: Read (via Impulse decomposition), ParDo,
  GroupByKey, Combine, Window, Flatten
- Watermark management: fully independent per-partition tracking
  (vector clock model)
- Bundle management: no physical buffering, using KS manual commit
  control (context.commit() with exactly_once_v2)
- State management via KS state stores (RocksDB + changelog topics)
- Non-merging windows only for the skeleton (Fixed, Sliding, Global)

Design doc:
https://docs.google.com/document/d/1BBMURhSG4SxPcvvnKMTrmnKCr_jhXL6R4TBDBW7zsy8/edit?usp=sharing

Several design decisions have been validated through local testing
(e.g., Impulse via bootstrap topic since non-existing topics fail
with MissingSourceTopicException, manual transactional commit control
with exactly_once_v2).

Would appreciate any feedback from the community. Happy to discuss
any aspects of the design.

Thanks,
Junaid Shaukat
https://github.com/junaiddshaukat
https://junaidshaukat.com/

Reply via email to