Hi everyone, I'm working on a design for a portable Kafka Streams Runner for Apache Beam as part of GSoC 2026 (tracking issue: https://github.com/apache/beam/issues/18479). Jan Lukavský has been mentoring the design process and we've been iterating on the document over the past few weeks.
The design targets the Fn API for full SDK support and covers: - Pipeline translation from proto to Kafka Streams Topology (following the Flink portable runner pattern) - Transform mappings: Read (via Impulse decomposition), ParDo, GroupByKey, Combine, Window, Flatten - Watermark management: fully independent per-partition tracking (vector clock model) - Bundle management: no physical buffering, using KS manual commit control (context.commit() with exactly_once_v2) - State management via KS state stores (RocksDB + changelog topics) - Non-merging windows only for the skeleton (Fixed, Sliding, Global) Design doc: https://docs.google.com/document/d/1BBMURhSG4SxPcvvnKMTrmnKCr_jhXL6R4TBDBW7zsy8/edit?usp=sharing Several design decisions have been validated through local testing (e.g., Impulse via bootstrap topic since non-existing topics fail with MissingSourceTopicException, manual transactional commit control with exactly_once_v2). Would appreciate any feedback from the community. Happy to discuss any aspects of the design. Thanks, Junaid Shaukat https://github.com/junaiddshaukat https://junaidshaukat.com/
