Hi,

If you're not using Beam SQL JOINs, you're not affected.

In Short

Beam SQL JOIN operation does not work well with multiple trigger firings.

More Detail

Beam SQL JOIN is a CoGBK under the hood. It joins available elements
per-pane. This means that:

   - in discarding mode we're joining only new elements which arrived since
   last trigger firing, forgetting about any past elements;
   - in accumulating mode in addition to joining new elements we're also
   emitting join results we have already emitted when trigger fired last time;

This behavior cannot be configured or handled in pure SQL.

Even More Detail

Here
<https://docs.google.com/document/d/1V-ZgKVTwHdNSGlQWncWIzcf_Rw2oLKZFSkU43scLff4/edit#>

Proposal

Short term, allow only trigger configuration which we can reason about,
rejecting anything else.

For example we know that non-global windows with default trigger with zero
allowed lateness will fire once per window. In this case we will join all
elements in the window once after watermark is past end of window.

Long term, retractions will allow correct handling of this situation.

Jira <https://issues.apache.org/jira/browse/BEAM-3345>, Pull Request
<https://github.com/apache/beam/pull/4642>

Regards,
Anton

Reply via email to