[
https://issues.apache.org/jira/browse/HIVE-26135?focusedWorklogId=755848&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-755848
]
ASF GitHub Bot logged work on HIVE-26135:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 12/Apr/22 15:44
Start Date: 12/Apr/22 15:44
Worklog Time Spent: 10m
Work Description: kgyrtkirk opened a new pull request, #3205:
URL: https://github.com/apache/hive/pull/3205
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/Hive/HowToContribute
2. Ensure that you have created an issue on the Hive project JIRA:
https://issues.apache.org/jira/projects/HIVE/summary
3. Ensure you have added or run the appropriate tests for your PR:
4. If the PR is unfinished, add '[WIP]' in your PR title, e.g.,
'[WIP]HIVE-XXXXX: Your PR title ...'.
5. Be sure to keep the PR description updated to reflect all changes.
6. Please write your PR title to summarize what this PR proposes.
7. If possible, provide a concise example to reproduce the issue for a
faster review.
-->
### What changes were proposed in this pull request?
restricts the optimization to only happen in case `X IS NULL` is present and
`X` is Strong and there is no `CAST` in `X`.
...in fact in Hive `CAST` is not `Strong`; but its not possible to change
how `CAST` is treated in the current version of Calcite
### Why are the changes needed?
without this patch incorrect optimaztion may happen
### Does this PR introduce _any_ user-facing change?
not really - it only makes the optimization less zealous
### How was this patch tested?
a qtest was added to cover for it
Issue Time Tracking
-------------------
Worklog Id: (was: 755848)
Remaining Estimate: 0h
Time Spent: 10m
> Invalid Anti join conversion may cause missing results
> ------------------------------------------------------
>
> Key: HIVE-26135
> URL: https://issues.apache.org/jira/browse/HIVE-26135
> Project: Hive
> Issue Type: Bug
> Reporter: Zoltan Haindrich
> Assignee: Zoltan Haindrich
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> right now I think the following is needed to trigger the issue:
> * left outer join
> * only select left hand side columns
> * conditional which is using some udf
> * the nullness of the udf is checked
> repro sql; in case the conversion happens the row with 'a' will be missing
> {code}
> drop table if exists t;
> drop table if exists n;
> create table t(a string) stored as orc;
> create table n(a string) stored as orc;
> insert into t values ('a'),('1'),('2'),(null);
> insert into n values ('a'),('b'),('1'),('3'),(null);
> explain select n.* from n left outer join t on (n.a=t.a) where
> assert_true(t.a is null) is null;
> explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as
> float) is null;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is
> null;
> set hive.auto.convert.anti.join=false;
> select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is
> null;
> {code}
> workaround could be to disable the feature:
> {code}
> set hive.auto.convert.anti.join=false;
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)