Hi Reuven,

I didn't investigate that particular one, but looking into that now, it looks that is (same as the "classic" join library) builds around CoGBK. Is that correct? If yes, then it essentially means that it:

 - works only for cases where both sides have the same windowfn (that is limitation of Flatten that precedes CoGBK)

 - when using global window, there has to be trigger and (afaik) there is no trigger that would guarantee firing after each data element (for early panes) (because triggers are there to express cost-latency tradeoff, not semantics)

Moreover, I'd like to define the join semantics so that when there are available elements from both sides, the fired pane should be ON_TIME, not EARLY. That essentially means that the fully general case would not be built around (Co)GBK, but stateful ParDo. There are specific options where this fully general case "degrades" into forms that can be efficiently expressed using (Co)GBK, that is true.

Jan

On 11/22/19 6:47 PM, Reuven Lax wrote:
Have you seen the Join library that is part of schemas? I'm curious whether this fits your needs, or there's something lacking there.

On Fri, Nov 22, 2019 at 12:31 AM Jan Lukavský <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    based on roadmap [1], we would like to define and implement a full
    set
    of (unified) stream-stream joins. That would include:

      - joins (left, right, full outer) on global window with "immediate
    trigger"

      - joins with different windowing functions on left and right side

    The approach would be to define these operations in a natural way, so
    that the definition is aligned with how current joins work (same
    windows, cartesian product of values with same keys, output timestamp
    projected to the end of window, etc.). Because this should be a
    generic
    approach, this effort should probably be part of join library,
    that can
    the be reused by other components, too (e.g. SQL).

    The question is - is (or was) there any effort that we can build
    upon?
    Or should this be designed from scratch?

    Jan

    [1] https://beam.apache.org/roadmap/euphoria/

Reply via email to