[ 
https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939100#comment-14939100
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11634:
----------------------------------------------------------

[~jcamachorodriguez]  Thanks for the feedback.
1. Changes to groupby_cube1.q do not seem part of this patch?
Thats true, reverted the change in the new patch.

2. In pcs.q.out, query in line 666:
explain extended select a.ds, b.key from pcs_t1 a, pcs_t1 b where struct(a.ds, 
a.key, b.ds) in (struct('2000-04-08',1, '2000-04-09'), struct('2000-04-09',2, 
'2000-04-08'))
Additional predicate is not derived, and thus partition pruning is not 
happening: we read partitions '2000-04-08', '2000-04-09', and '2000-04-10'. Any 
idea why this is happening? Could you check that case?
I checked this and this seems to happen in case of shuffle join, I am still 
investigating this. For map join, this works fine and I have modified the test 
case accordingly.

3. We still do not seem to be removing the predicates that are used for 
partition pruning properly from the Filter predicates e.g. pointlookup2.q.out 
or pointlookup3.q.out. I think this patch should take care of that too?

We still do not seem to be removing the predicates that are used for partition 
pruning properly from the Filter predicates e.g. pointlookup2.q.out or 
pointlookup3.q.out. I think this patch should take care of that too?
Thats true, I debugged this and it goes through the change in 
PcrExprProcFactory.java I had introduced which should have removed the extra 
filter predicates. I am surprised why this doesnt happen for this particular 
scenario. Would it be ok to cover this in a follow-up jira since this is not a 
regression from the baseline.

4.  we were prepending a new conjunction to the original predicate for 
non-partition columns if we were reducing the NDV in the IN clause. Do you 
think it would be easy to extend your patch to cover this case too? 

I think this might require some more changes than the initial work since 1. in 
this current patch I dont necessarily separate each and every column, I club 
the partition columns into the same struct when possible. 2. I need to let the 
PCR know that this additional predicate should not be removed if this is a 
partition column and contributed to reducing the NDV. 

Thanks
Hari

> Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
> ------------------------------------------------------------------
>
>                 Key: HIVE-11634
>                 URL: https://issues.apache.org/jira/browse/HIVE-11634
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Hari Sankar Sivarama Subramaniyan
>         Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, 
> HIVE-11634.3.patch, HIVE-11634.4.patch, HIVE-11634.5.patch, 
> HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch, 
> HIVE-11634.9.patch, HIVE-11634.91.patch, HIVE-11634.92.patch, 
> HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch, 
> HIVE-11634.96.patch
>
>
> Currently, we do not support partition pruning for the following scenario
> {code}
> create table pcr_t1 (key int, value string) partitioned by (ds string);
> insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src 
> where key < 20 order by key;
> insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src 
> where key < 20 order by key;
> explain extended select ds from pcr_t1 where struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> If we run the above query, we see that all the partitions of table pcr_t1 are 
> present in the filter predicate where as we can prune  partition 
> (ds='2000-04-10'). 
> The optimization is to rewrite the above query into the following.
> {code}
> explain extended select ds from pcr_t1 where  (struct(ds)) IN 
> (struct('2000-04-08'), struct('2000-04-09')) and  struct(ds, key) in 
> (struct('2000-04-08',1), struct('2000-04-09',2));
> {code}
> The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09'))  
> is used by partition pruner to prune the columns which otherwise will not be 
> pruned.
> This is an extension of the idea presented in HIVE-11573.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to