I reviewed your PR (https://github.com/apache/beam/pull/9199) and Anton's as another reference (https://github.com/apache/beam/pull/4742). Nice work. I thought I would summarize for the list a little bit. I think we have not done too much with retractions because it seems like a big job. You both have shown that it is maybe not that hard to implement the core. But it will have a lot of user-facing things that we have to test very carefully.
- the technical changes are primarily to ReduceFn aka GroupAlsoByWindow which is the core of stateful aggregation and is straightforward, which is cool - the boilerplate through the codebase is a lot (most of the ~1000 lines of both PRs) but it could have been a lot worse, so we are lucky :-) Here are steps forward that I can think of: - we need backwards compatibility, which is trivial because it is a new accumulation mode - we need a little more mathematical analysis (at least personally to have more confidence there are no bad surprises) - we need more description of the user-facing impact and API changes (same reason) - lots and lots of @ValidatesRunner tests - some opinions here from runner authors about efficiency in their system - and merging/unmerging window support matters since that is a key retractions use case too, but I would save it for later in my opinion (if you've seen Tyler's hack to do Validity Windows then that is even harder) - we also need protections so that things which will do not work with retractions are rejected, which will be all existing user DoFns and all sinks I've got an old doc made w/ Anton, Ben, and a couple others that I can try to find time to edit and share that deals a little bit with mathematics (messy and incomplete) and the API/compatibility questions (more useful, probably). You've seen it offline but the list has not seen a public version. I was going to try to merge it with yours but I can get it out quicker if I just allow for the overlaps. Kenn On Mon, Aug 12, 2019 at 9:47 PM Rui Wang <[email protected]> wrote: > Hello! > > I have also been building a proof of concept(PR > <https://github.com/apache/beam/pull/9199>), which implements the > streaming wordcount example in the design doc. > > What is missing in the PoC is ordering guarantee implementation in sink > (which I am working on). > > > -Rui > > On Wed, Jul 24, 2019 at 1:37 PM Rui Wang <[email protected]> wrote: > >> Hello! >> >> In case you are not aware of, I have added a modified streaming wordcount >> example at the end of the doc to illustrate retractions. >> >> >> -Rui >> >> On Wed, Jul 10, 2019 at 10:58 AM Rui Wang <[email protected]> wrote: >> >>> Hi Community, >>> >>> Retractions is a part of core Beam model [1]. I come up with a doc to >>> discuss retractions about use cases, model and API (see the link below). >>> This is a very beginning discussion on retractions but I do hope we can >>> have a consensus and make retractions implemented in a useful way >>> eventually. >>> >>> >>> doc link: >>> https://docs.google.com/document/d/14WRfxwk_iLUHGPty3C6ZenddPsp_d6jhmx0vuafXqmE/edit?usp=sharing >>> >>> >>> [1]: https://issues.apache.org/jira/browse/BEAM-91 >>> >>> >>> -Rui >>> >>
