Stepan Stepanishchev created FLINK-38916:
--------------------------------------------

             Summary: MultiJoin produces incorrect results for OR join 
conditions when parallelism is greater than 1
                 Key: FLINK-38916
                 URL: https://issues.apache.org/jira/browse/FLINK-38916
             Project: Flink
          Issue Type: Bug
          Components: Table SQL / Planner
    Affects Versions: 2.2.0, 2.1.0, 2.0.0
            Reporter: Stepan Stepanishchev


[This PRĀ |https://github.com/apache/flink/pull/27415] adds integration tests 
that expose incorrect results produced by the MultiJoin operator for queries 
with OR conditions in join predicates.
For example, the following query:
{code:sql}
SELECT u.user_id, u.name, o.order_id, p.payment_id, p.price
    FROM Users u
    JOIN Orders o ON u.user_id = o.user_id
    JOIN Payments p ON o.user_id = p.user_id OR p.price = u.cash
{code}
produces incorrect results when executed with parallelism greater than 1 and 
MultiJoin enabled.

The MultiJoin operator treats this query as if it has a common join key 
(user_id). Therefore, the inputs are distributed by that key. As a result, 
records that satisfy *p.price = u.cash* but have
*p.user_id != o.user_id* are assigned to different subtasks and never joined, 
leading to missing result rows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to