[
https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939100#comment-14939100
]
Hari Sankar Sivarama Subramaniyan commented on HIVE-11634:
----------------------------------------------------------
[~jcamachorodriguez] Thanks for the feedback.
1. Changes to groupby_cube1.q do not seem part of this patch?
Thats true, reverted the change in the new patch.
2. In pcs.q.out, query in line 666:
explain extended select a.ds, b.key from pcs_t1 a, pcs_t1 b where struct(a.ds,
a.key, b.ds) in (struct('2000-04-08',1, '2000-04-09'), struct('2000-04-09',2,
'2000-04-08'))
Additional predicate is not derived, and thus partition pruning is not
happening: we read partitions '2000-04-08', '2000-04-09', and '2000-04-10'. Any
idea why this is happening? Could you check that case?
I checked this and this seems to happen in case of shuffle join, I am still
investigating this. For map join, this works fine and I have modified the test
case accordingly.
3. We still do not seem to be removing the predicates that are used for
partition pruning properly from the Filter predicates e.g. pointlookup2.q.out
or pointlookup3.q.out. I think this patch should take care of that too?
We still do not seem to be removing the predicates that are used for partition
pruning properly from the Filter predicates e.g. pointlookup2.q.out or
pointlookup3.q.out. I think this patch should take care of that too?
Thats true, I debugged this and it goes through the change in
PcrExprProcFactory.java I had introduced which should have removed the extra
filter predicates. I am surprised why this doesnt happen for this particular
scenario. Would it be ok to cover this in a follow-up jira since this is not a
regression from the baseline.
4. we were prepending a new conjunction to the original predicate for
non-partition columns if we were reducing the NDV in the IN clause. Do you
think it would be easy to extend your patch to cover this case too?
I think this might require some more changes than the initial work since 1. in
this current patch I dont necessarily separate each and every column, I club
the partition columns into the same struct when possible. 2. I need to let the
PCR know that this additional predicate should not be removed if this is a
partition column and contributed to reducing the NDV.
Thanks
Hari
> Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
> ------------------------------------------------------------------
>
> Key: HIVE-11634
> URL: https://issues.apache.org/jira/browse/HIVE-11634
> Project: Hive
> Issue Type: Bug
> Components: CBO
> Reporter: Hari Sankar Sivarama Subramaniyan
> Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch,
> HIVE-11634.3.patch, HIVE-11634.4.patch, HIVE-11634.5.patch,
> HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch,
> HIVE-11634.9.patch, HIVE-11634.91.patch, HIVE-11634.92.patch,
> HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch,
> HIVE-11634.96.patch
>
>
> Currently, we do not support partition pruning for the following scenario
> {code}
> create table pcr_t1 (key int, value string) partitioned by (ds string);
> insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src
> where key < 20 order by key;
> explain extended select ds from pcr_t1 where struct(ds, key) in
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> If we run the above query, we see that all the partitions of table pcr_t1 are
> present in the filter predicate where as we can prune partition
> (ds='2000-04-10').
> The optimization is to rewrite the above query into the following.
> {code}
> explain extended select ds from pcr_t1 where (struct(ds)) IN
> (struct('2000-04-08'), struct('2000-04-09')) and struct(ds, key) in
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09'))
> is used by partition pruner to prune the columns which otherwise will not be
> pruned.
> This is an extension of the idea presented in HIVE-11573.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)