[ 
https://issues.apache.org/jira/browse/FLINK-38916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-38916:
-----------------------------------
    Labels: pull-request-available  (was: )

> MultiJoin produces incorrect results for OR join conditions when parallelism 
> is greater than 1
> ----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-38916
>                 URL: https://issues.apache.org/jira/browse/FLINK-38916
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>    Affects Versions: 2.0.0, 2.1.0, 2.2.0
>            Reporter: Stepan Stepanishchev
>            Priority: Critical
>              Labels: pull-request-available
>
> [This PRĀ |https://github.com/apache/flink/pull/27415] adds integration tests 
> that expose incorrect results produced by the MultiJoin operator for queries 
> with OR conditions in join predicates.
> For example, the following query:
> {code:sql}
> SELECT u.user_id, u.name, o.order_id, p.payment_id, p.price
>     FROM Users u
>     JOIN Orders o ON u.user_id = o.user_id
>     JOIN Payments p ON o.user_id = p.user_id OR p.price = u.cash
> {code}
> produces incorrect results when executed with parallelism greater than 1 and 
> MultiJoin enabled.
> The MultiJoin operator treats this query as if it has a common join key 
> (user_id). Therefore, the inputs are distributed by that key. As a result, 
> records that satisfy *p.price = u.cash* but have
> *p.user_id != o.user_id* are assigned to different subtasks and never joined, 
> leading to missing result rows.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to