I took a look and mentioned the PR to a few folks. Couple of thoughts: - We should avoid Beam adding a high-level functionality only for Batch. - Supporting Windowing/Triggers would likely be non-trivial and worth considering early in the design. - If you'd like to continue working on this, I would suggest to start a document and gradually cover the following (with help/feedback from others in this list): - the motivational example, available workarounds (if any) - background on aspects of combiner implementation that are relevant. - proposed approach and considered alternatives - any runner-specific considerations.
Thanks, Valentyn On Fri, Mar 29, 2024 at 5:06 AM Joey Tran <joey.t...@schrodinger.com> wrote: > I posted a PoC PR [1] for fixing deferred side inputs with combiners in > the python SDK. Would someone be willing to take a look at it? > > I have it working but could use some feedback on where to take it next. It > looks like bundle processor combiner operations don't currently support > side inputs [2] so I added a conditional in `CombinePerKey` that checks > whether it was instantiated with a side input and if so, use a ParDo-based > version of the combiner so we can piggyback off of the Do operations > implementation of side inputs rather than reimplementing it for the > combiner operation. > > [1] https://github.com/apache/beam/pull/30743 > [2] > https://github.com/apache/beam/blob/e3fee5156b3515f96dc5ba90ea2fbc6f6be2bd34/sdks/python/apache_beam/runners/worker/operations.py#L1146 >