Hi Community, BeamSQL currently does not support unbounded-unbounded join with non-default trigger. It is because:
- Discarding mode does not work for outer joins because of lacking of ability to retract pre-emitted values. You can think about an example in which a tuple of (left_row, null) needed to be retracted if the matched right_row appears since last trigger fired. - Accumulating mode *theoretically* can support unbounded-unbounded join because it's supposed to always "overwrite" previous result. However in practice, for join use cases such overwriting is too expensive. It would be much more efficient if small changes in inputs of join only cause small changes to downstream to compute. - Both discarding mode and accumulating mode are not sufficient to refine materialized data. Meanwhile, [1] has kicked off a discussion on retractions in Beam model. I have been collecting people's feedback and generally speaking people agree that retractions are useful for some use cases. Thus I propose to combine SQL join with retractions to support multiple-triggering SQL Join. I think SQL join is a good start for supporting retraction in Beam with the following caveats: 1. multiple-triggering SQL Join is a useful feature. 2. SQL join is an opportunity for us to figure out implementation details of retraction by building it for a well defined use case. 3. Supporting retraction should not cause performance regression on existing pipelines, or require changes on existing pipelines. What do you think? [1]: https://lists.apache.org/thread.html/bb2d40b1bea8b21fbbb7caf599fabba823da357768ceca8ea2363789@%3Cdev.beam.apache.org%3E -Rui