[ 
https://issues.apache.org/jira/browse/IMPALA-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948042#comment-17948042
 ] 

ASF subversion and git services commented on IMPALA-13991:
----------------------------------------------------------

Commit d8e18fac619a129270634513359b3386212ff919 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d8e18fac6 ]

IMPALA-13991: Skip CROSS_JOIN rewrite if subquery is in disjunctive

Inside StmtRewriter.mergeExpr() there is an optimization that set
JoinOperator.CROSS_JOIN under certain scenario. This patch adds criteria
to SKIP that rewrite if subquery is coming from inside disjunctive
expression, regardless of joinConjunct value. If joinConjunct is NOT
NULL, the inlineView maybe correlated through that joinConjunct. If
joinConjunct is NULL, then expr is a (NOT) EXISTS predicate. EXISTS
within a disjunct is not supported yet (see IMPALA-9931).

Testing:
- Add planner and query test for the corner case.
  Before this patch, the query return wrong result.
- Fixed wrong testcase in subquery-rewrite.test.

Change-Id: Iac0deb0b2fb1536684cce2e004156a20b769b9ab
Reviewed-on: http://gerrit.cloudera.org:8080/22815
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Nested loop join compose wrong predicate.
> -----------------------------------------
>
>                 Key: IMPALA-13991
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13991
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Major
>             Fix For: Impala 5.0.0
>
>         Attachments: profile_9c41d9d5e3a79d15_e927d8c400000000.txt
>
>
> Using functional_parquet database, run the following:
> {noformat}
> [localhost:21050] functional_parquet> select * from alltypes where (id in 
> (1000, 1001)) OR ((id in (1, 2)) AND timestamp_col <= (select timestamp_col 
> from alltypestiny order by timestamp_col DESC limit 1));
> Query: select * from alltypes where (id in (1000, 1001)) OR ((id in (1, 2)) 
> AND timestamp_col <= (select timestamp_col from alltypestiny order by 
> timestamp_col DESC limit 1))
> Query submitted at: 2025-04-24 12:06:34 (Coordinator: 
> http://rsuminto-22746:25000)
> Query state can be monitored at: 
> http://rsuminto-22746:25000/query_plan?query_id=9c41d9d5e3a79d15:e927d8c400000000
> +----+----------+-------------+--------------+---------+------------+---------------+------------+-----------------+------------+-------------------------------+------+-------+
> | id | bool_col | tinyint_col | smallint_col | int_col | bigint_col | 
> float_col     | double_col | date_string_col | string_col | timestamp_col     
>             | year | month |
> +----+----------+-------------+--------------+---------+------------+---------------+------------+-----------------+------------+-------------------------------+------+-------+
> | 1  | false    | 1           | 1            | 1       | 10         | 
> 1.10000002384 | 10.1       | 01/01/09        | 1          | 2009-01-01 
> 00:01:00           | 2009 | 1     |
> | 2  | true     | 2           | 2            | 2       | 20         | 
> 2.20000004768 | 20.2       | 01/01/09        | 2          | 2009-01-01 
> 00:02:00.100000000 | 2009 | 1     |
> +----+----------+-------------+--------------+---------+------------+---------------+------------+-----------------+------------+-------------------------------+------+-------+
> Fetched 2 row(s) in 0.11s
> [localhost:21050] functional_parquet> select * from alltypes where (id in 
> (1000, 1001)) OR ((id in (1, 2)) AND timestamp_col <= '2009-04-01 00:01:00');
> Query: select * from alltypes where (id in (1000, 1001)) OR ((id in (1, 2)) 
> AND timestamp_col <= '2009-04-01 00:01:00')
> Query submitted at: 2025-04-24 12:06:43 (Coordinator: 
> http://rsuminto-22746:25000)
> Query state can be monitored at: 
> http://rsuminto-22746:25000/query_plan?query_id=fc40ff3109deea71:cb99a7e200000000
> +------+----------+-------------+--------------+---------+------------+---------------+------------+-----------------+------------+-------------------------------+------+-------+
> | id   | bool_col | tinyint_col | smallint_col | int_col | bigint_col | 
> float_col     | double_col | date_string_col | string_col | timestamp_col     
>             | year | month |
> +------+----------+-------------+--------------+---------+------------+---------------+------------+-----------------+------------+-------------------------------+------+-------+
> | 1    | false    | 1           | 1            | 1       | 10         | 
> 1.10000002384 | 10.1       | 01/01/09        | 1          | 2009-01-01 
> 00:01:00           | 2009 | 1     |
> | 2    | true     | 2           | 2            | 2       | 20         | 
> 2.20000004768 | 20.2       | 01/01/09        | 2          | 2009-01-01 
> 00:02:00.100000000 | 2009 | 1     |
> | 1001 | false    | 1           | 1            | 1       | 10         | 
> 1.10000002384 | 10.1       | 04/11/09        | 1          | 2009-04-11 
> 01:41:04.500000000 | 2009 | 4     |
> | 1000 | true     | 0           | 0            | 0       | 0          | 0.0   
>         | 0.0        | 04/11/09        | 0          | 2009-04-11 
> 01:40:04.500000000 | 2009 | 4     |
> +------+----------+-------------+--------------+---------+------------+---------------+------------+-----------------+------------+-------------------------------+------+-------+
> Fetched 4 row(s) in 0.11s
> [localhost:21050] functional_parquet> select timestamp_col from alltypestiny 
> order by timestamp_col DESC limit 1;
> Query: select timestamp_col from alltypestiny order by timestamp_col DESC 
> limit 1
> Query submitted at: 2025-04-24 12:11:43 (Coordinator: 
> http://rsuminto-22746:25000)
> Query state can be monitored at: 
> http://rsuminto-22746:25000/query_plan?query_id=054746f7ea1bdc26:3711f6f800000000
> +---------------------+
> | timestamp_col       |
> +---------------------+
> | 2009-04-01 00:01:00 |
> +---------------------+
> Fetched 1 row(s) in 0.11s
> {noformat}
>  
> NESTED LOOP JOIN should not take out the less than predicate over 
> timestamp_col as its own equality predicate.
> {noformat}
> 03:NESTED LOOP JOIN [INNER JOIN, BROADCAST]
> |  predicates: timestamp_col <= timestamp_col, (id IN (CAST(1000 AS INT), 
> CAST(1001 AS INT))) OR ((id IN (CAST(1 AS INT), CAST(2 AS INT))) AND 
> timestamp_col IS NOT NULL)
> |  mem-estimate=16B mem-reservation=0B thread-reservation=0
> |  tuple-ids=0,2 row-size=96B cardinality=11.95K
> |  in pipelines: 00(GETNEXT), 02(OPEN){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to