[
https://issues.apache.org/jira/browse/FLINK-38916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-38916:
-----------------------------------
Labels: pull-request-available (was: )
> MultiJoin produces incorrect results for OR join conditions when parallelism
> is greater than 1
> ----------------------------------------------------------------------------------------------
>
> Key: FLINK-38916
> URL: https://issues.apache.org/jira/browse/FLINK-38916
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Planner
> Affects Versions: 2.0.0, 2.1.0, 2.2.0
> Reporter: Stepan Stepanishchev
> Priority: Critical
> Labels: pull-request-available
>
> [This PRĀ |https://github.com/apache/flink/pull/27415] adds integration tests
> that expose incorrect results produced by the MultiJoin operator for queries
> with OR conditions in join predicates.
> For example, the following query:
> {code:sql}
> SELECT u.user_id, u.name, o.order_id, p.payment_id, p.price
> FROM Users u
> JOIN Orders o ON u.user_id = o.user_id
> JOIN Payments p ON o.user_id = p.user_id OR p.price = u.cash
> {code}
> produces incorrect results when executed with parallelism greater than 1 and
> MultiJoin enabled.
> The MultiJoin operator treats this query as if it has a common join key
> (user_id). Therefore, the inputs are distributed by that key. As a result,
> records that satisfy *p.price = u.cash* but have
> *p.user_id != o.user_id* are assigned to different subtasks and never joined,
> leading to missing result rows.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)