[
https://issues.apache.org/jira/browse/HIVE-14652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438371#comment-15438371
]
Sergey Shelukhin edited comment on HIVE-14652 at 8/26/16 3:02 AM:
------------------------------------------------------------------
The fix (and also a refactor of the class to not have a million-line method).
I have a vague feeling that most of the logic in this method is bogus, but it
may be just because I am missing something, because it apparently works. The
main question is, why do we evaluate UDFs on partition values from the pruned
set for the filters that we purport to remove, if we have just used the same
filters to prune the partitions, so one of the two should be true - either we
cannot eliminate the filter, or the final result of all the expressions is
known to be true (or not matter). So we'd insta-bail as soon as we'd see any
disagreement after evaluation; or have a walk state that indicates the value
doesn't matter.
I don't really know if that's the case or if I'm missing something here.
So for now the fix is to change the new IN logic introduced by HIVE-11424 to
follow the same twisted logic.
Let's see what that breaks.
The problem is that HIVE-11424 changes IN to true if there's a column on the
left side, but, as described above, this IN was used to filter the partitions,
so in the NOT IN case, IN is guaranteed to be false. So, while the "regular"
logic would have confirmed that and then applied NOT to the false constant, the
current code results in NOT being applied to the true constant.
cc [~jcamachorodriguez] [~ashutoshc]
EDIT: I think the old IN logic for UDF on the left hand side might also be
broken the same way, need to take a look
was (Author: sershe):
The fix (and also a refactor of the class to not have a million-line method).
I have a vague feeling that most of the logic in this method is bogus, but it
may be just because I am missing something, because it apparently works. The
main question is, why do we evaluate UDFs on partition values from the pruned
set for the filters that we purport to remove, if we have just used the same
filters to prune the partitions, so one of the two should be true - either we
cannot eliminate the filter, or the final result of all the expressions is
known to be true (or not matter). So we'd insta-bail as soon as we'd see any
disagreement after evaluation; or have a walk state that indicates the value
doesn't matter.
I don't really know if that's the case or if I'm missing something here.
So for now the fix is to change the new IN logic introduced by HIVE-11424 to
follow the same twisted logic.
Let's see what that breaks.
The problem is that HIVE-11424 changes IN to true if there's a column on the
left side, but, as described above, this IN was used to filter the partitions,
so in the NOT IN case, IN is guaranteed to be false. So, while the "regular"
logic would have confirmed that and then applied NOT to the false constant, the
current code results in NOT being applied to the true constant.
cc [~jcamachorodriguez] [~ashutoshc]
> incorrect results for not in on partition columns
> -------------------------------------------------
>
> Key: HIVE-14652
> URL: https://issues.apache.org/jira/browse/HIVE-14652
> Project: Hive
> Issue Type: Bug
> Affects Versions: 2.1.0, 2.2.0
> Reporter: stephen sprague
> Assignee: Sergey Shelukhin
> Priority: Blocker
> Attachments: HIVE-14652.patch
>
>
> {noformat}
> create table foo (i int) partitioned by (s string);
> insert overwrite table foo partition(s='foo') select cint from alltypesorc
> limit 10;
> insert overwrite table foo partition(s='bar') select cint from alltypesorc
> limit 10;
> select * from foo where s not in ('bar');
> {noformat}
> No results. IN ... works correctly
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)