Francesco Guardiani created FLINK-24466:
-------------------------------------------
Summary: Interval Join late events handling behaviour is not
consistent
Key: FLINK-24466
URL: https://issues.apache.org/jira/browse/FLINK-24466
Project: Flink
Issue Type: Bug
Components: Table SQL / Runtime
Reporter: Francesco Guardiani
Attachments: Fix_late_events_filtering_for_interval_join.patch
Interval Join handles late events emitting them in the output, as a padded row.
This behavior is also tested extensively in {{RowTimeIntervalJoinTest}}.
The problem with this behavior is the way an event is considered "late" or not:
in order to distinguish between the two, {{RowTimeIntervalJoin}} uses the
{{ctx.timerService().currentWatermark()}} to find out if an event is later than
the last received watermark or not. But that method returns the "combined"
watermark across all the keys, partitions and *input streams*, that is if one
of the two streams goes "slower" than the other one, the returned watermark is
going to be the minimum among the two.
This means that our late events handling effectively works only if the two
streams run "at the same pace", otherwise we'll just see what we consider _late
events_ for one of the two streams as joined.
To observe this behavior, just run the test
{{IntervalJoinITCase#testRowTimeInnerJoinWithEquiTimeAttrs}} in this revision
https://github.com/apache/flink/commit/7033cbfe404bea1519d3342a611e2f92768d70f9
several times and you'll see that after a couple of runs it fails, joining one
of the {{"should-be-discarded"}} records. Those records are way behind the
watermark - 1 second, as defined.
You'll find attached in the issue a small patch to show how this could be fixed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)