Fabian Hueske created FLINK-39719:
-------------------------------------

             Summary: TemporalRowTimeJoinOperator produces wrong results for 
late probe-side records
                 Key: FLINK-39719
                 URL: https://issues.apache.org/jira/browse/FLINK-39719
             Project: Flink
          Issue Type: Bug
          Components: Table SQL / Runtime
    Affects Versions: 2.3.0
            Reporter: Fabian Hueske


The `TemporalRowTimeJoinOperator` might produce wrong results for late-arriving 
probe-side records.
This is expected because a main property of the operator is to evict state on 
watermark progress. A late-arriving probe-side record might not find the 
correct build-side record to join against. It could either find the correct 
record, a newer (incorrect) record, or no record to join with.

This behavior causes problems especially if the probe-side input is a 
retraction stream. Retraction messages (DELETE, UPDATE_BEFORE) are prone to be 
late because their event-timestamp must match the event-time of the 
corresponding INSERT record. 
Joining retraction rows against incorrect build-side records means that 
downstream operators cannot resolve the retraction message.

Since `TemporalRowTimeJoinOperator` is designed to evict state on watermark 
progress, we should drop late probe-side records. Most other event/row-time 
based operators such as windowed aggregations, row-time OVER aggregate, 
interval join have the same behavior.
Similar to these operators, we should track the number of dropped late records 
in a metric.

Dropping late records is of course not a perfect solution, but 1) predictable 
(and IMO expected) behavior and 2) not worse than producing wrong results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to