Thanks for the great write up.

I have a high level question. Why would Python SDK need an UnboundedSource
/ UnboundedReader API ?

Java SDK has these since they predated the Splittable DoFn API. We added
the Unbounded wrapper (as you noted) to make old sources implemented in
this API available to runners that only support Splittable DoFn. Python
SDK, on the other hand, only supports portable runners, where Splittable
DoFn support is available and any new sources can be implemented using
Splittable DoFn API:
https://beam.apache.org/documentation/io/developing-io-overview/

+1 for adding support for the Watch transform.

Thanks,
Cham

On Mon, May 4, 2026 at 5:24 AM Elia Liu <[email protected]> wrote:

> I'd also like to share two documents with the community for feedback:
>
> • Design doc (also serving as my GSoC proposal):
>
> https://drive.google.com/file/d/1M_cg0iQh0D5AYyPRRoo88qcs2LKIcENW/view?usp=sharing
>
> • Proof of concept: https://github.com/Eliaaazzz/beam-native-streaming-poc
>
> The design doc and PoC cover the proposed SDF architecture, watermark
> and checkpoint semantics, and the test suite.
>
> Any feedback, suggestions, or concerns are very welcome, especially
> around watermark propagation, checkpoint encoding, and runner
> compatibility. Happy to iterate based on community feedback!
>
> On Fri, May 1, 2026 at 12:05 PM Elia LIU <[email protected]> wrote:
> >
> > Hi all,
> >
> > I'm Elia, a final-year Computer Science student at the University of
> > Melbourne. I'm writing to introduce myself as one of this year's GSoC
> > contributors under Apache Beam. Over the next few months I'll be
> > working with Yi Hu on Native Streaming Transforms for the Python SDK,
> > and I'm really looking forward to it.
> >
> > I've been contributing to Beam in smaller ways over the past several
> > months, and I've come to deeply appreciate how thoughtful and
> > welcoming this community is. I'm grateful for the chance to take on a
> > more sustained piece of work and to engage more closely with all of
> > you.
> >
> > === Project Summary ===
> >
> > Beam's Python SDK is increasingly the first choice for ML and
> > data-intensive pipelines, making robust native streaming APIs more
> > important than ever. However, two essential streaming primitives,
> > UnboundedSource (https://github.com/apache/beam/issues/19137) and
> > Watch (https://github.com/apache/beam/issues/21521), remain
> > unavailable in Python despite being long established in the Java SDK.
> > Today, Python developers who need these capabilities either pull in
> > cross-language transforms that add Java dependencies and gRPC
> > overhead, or wrestle with low-level RestrictionTracker internals.
> >
> > This project aims to address the gap by porting both primitives
> > natively to Python on top of Beam's existing SDF framework. The
> > UnboundedSource wrapper adapts the legacy reader API into a Splittable
> > DoFn that handles checkpointing, watermark progression, deduplication,
> > and splitting. The Watch transform adds periodic polling with
> > composable termination conditions and stable dedup behavior.
> >
> > Porting both primitives natively abstracts that complexity behind
> > clean and Pythonic interfaces. For ML and data-intensive pipelines,
> > this means streaming ingestion from sources like Kafka and Pub/Sub,
> > continuous polling for new training data or model artifacts, and
> > event-driven feature pipelines can all be expressed natively in
> > Python, without cross-language overhead or low-level SDF code.
> >
> > === Planned Deliverables ===
> >
> > * D1: A Python UnboundedSource API with its SDF-based wrapper
> > * D2: A native Watch transform with PollFn and TerminationConditions
> > * D3: A test suite spanning DirectRunner and Dataflow
> > * D4: Supporting documentation including docstrings, programming-guide
> > updates, and migration notes
> >
> > I'd genuinely value any feedback, suggestions, or historical context
> > from those of you who've worked on related areas. I'm sure there are
> > nuances and edge cases I haven't fully thought through yet, and I'd be
> > very glad to hear from anyone willing to share thoughts.
> >
> > Many thanks to Yi Hu for the mentorship, and to the wider community
> > for the warm reception so far. Looking forward to working alongside
> > you all over the coming months.
> >
> > Best regards,
> > Elia
>

Reply via email to