Hi everyone, My name is Elia. I am preparing a proposal for GSoC 2026 to tackle the project idea originally proposed by Yi Hu: bringing native streaming primitives—specifically the UnboundedSource wrapper and the Watch transform to the Python SDK.
*The Problem:* Currently, Python developers lack convenient primitives for continuous data ingestion. While the Java SDK has mature tools like UnboundedSource and Watch, Python users often have to implement complex Splittable DoFn (SDF) logic from scratch just to achieve basic streaming IO patterns (e.g., polling a model registry or tracking a growing directory). *The Proposal:* Based on this GSoC project idea, I have drafted a proposal to implement an UnboundedSource wrapper built on SDF and a native Watch transform for periodic polling. The current draft incorporates some helpful early feedback from Yi Hu. I am now sharing it here to gather broader community feedback, structural reviews, or any concerns. You can find the full proposal and technical design here: https://docs.google.com/document/d/1gi7nqviUjvLIUdeg1I7KslW9qunD1lKEarMMONhyZeQ/edit?tab=t.0 I would deeply appreciate any thoughts or suggestions from the community. Thank you for your time! Best regards, Elia (Zhenyu Liu)
