Eliaaazzz commented on issue #21521: URL: https://github.com/apache/beam/issues/21521#issuecomment-4055812109
I put together a focused PoC for the Python `Watch` transform to validate the core execution model before trying to build a full implementation. PoC branch: `https://github.com/Eliaaazzz/beam/tree/gsoc/poc` For the `Watch` side specifically, the PoC currently has **56 passing tests**. The current prototype is mainly exploring whether the main `Watch` lifecycle can be represented cleanly in Python using an SDF-based implementation shape. So far, it includes: - a builder-style `Watch.growth_of(...)` API with poll interval / termination configuration - a `PollResult` abstraction carrying outputs, optional watermark, and completion - composable per-input termination conditions - a `GrowthState` hierarchy separating active polling state from replay state - a custom `GrowthStateCoder` with structured encoding and backward-compatibility decode paths - a `GrowthStateTracker` with explicit three-case `try_split()` behavior modeled after Java `Watch.GrowthTracker` - an SDF `process()` flow that distinguishes replay from active polling - output dedup via `output_key_fn` and stable hashing - watermark / completion handling in the polling path The goal of this first step was not full parity or production hardening yet, but to de-risk the design and get executable validation around the hardest parts first: state evolution, resumption, split behavior, termination behavior, and watermark-related semantics. If this direction looks reasonable, my next step would be to tighten it further against the Java semantics and then expand it toward a more complete implementation. Feedback would be very helpful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
