[ 
https://issues.apache.org/jira/browse/HIVE-15269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16335426#comment-16335426
 ] 

Deepak Jaiswal commented on HIVE-15269:
---------------------------------------

The code you showed is a place holder for the min-max and bloom filter values.

The 2nd GBY->RS calculates the final min, max and bloom filters by aggregating 
all the min,max and bloom filters in 1st branch.

Please refer to this test and its result file for explain plans. The groupby 
has 3 columns, namely, min, max and bloom_filter.

I hope it helps.

> Dynamic Min-Max/BloomFilter runtime-filtering for Tez
> -----------------------------------------------------
>
>                 Key: HIVE-15269
>                 URL: https://issues.apache.org/jira/browse/HIVE-15269
>             Project: Hive
>          Issue Type: New Feature
>          Components: Tez
>            Reporter: Jason Dere
>            Assignee: Deepak Jaiswal
>            Priority: Major
>              Labels: TODOC2.2.0
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15269.1.patch, HIVE-15269.10.patch, 
> HIVE-15269.11.patch, HIVE-15269.12.patch, HIVE-15269.13.patch, 
> HIVE-15269.14.patch, HIVE-15269.15.patch, HIVE-15269.16.patch, 
> HIVE-15269.17.patch, HIVE-15269.18.patch, HIVE-15269.19.patch, 
> HIVE-15269.2.patch, HIVE-15269.3.patch, HIVE-15269.4.patch, 
> HIVE-15269.5.patch, HIVE-15269.6.patch, HIVE-15269.7.patch, 
> HIVE-15269.8.patch, HIVE-15269.9.patch
>
>
> If a dimension table and fact table are joined:
> {noformat}
> select *
> from store join store_sales on (store.id = store_sales.store_id)
> where store.s_store_name = 'My Store'
> {noformat}
> One optimization that can be done is to get the min/max store id values that 
> come out of the scan/filter of the store table, and send this min/max value 
> (via Tez edge) to the task which is scanning the store_sales table.
> We can add a BETWEEN(min, max) predicate to the store_sales TableScan, where 
> this predicate can be pushed down to the storage handler (for example for ORC 
> formats). Pushing a min/max predicate to the ORC reader would allow us to 
> avoid having to entire whole row groups during the table scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to