[ 
https://issues.apache.org/jira/browse/HIVE-7826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119743#comment-14119743
 ] 

Gunther Hagleitner commented on HIVE-7826:
------------------------------------------

Committed to branch. Thanks [~vikram.dixit]. [~damien.carol] thanks for trying 
it out. Let me know if you're still having problems with this. I'll address in 
follow up if need be.

> Dynamic partition pruning on Tez
> --------------------------------
>
>                 Key: HIVE-7826
>                 URL: https://issues.apache.org/jira/browse/HIVE-7826
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
>            Assignee: Gunther Hagleitner
>              Labels: TODOC14, tez
>         Attachments: HIVE-7826.1.patch, HIVE-7826.2.patch, HIVE-7826.3.patch, 
> HIVE-7826.4.patch, HIVE-7826.5.patch, HIVE-7826.6.patch, HIVE-7826.7.patch
>
>
> It's natural in a star schema to map one or more dimensions to partition 
> columns. Time or location are likely candidates. 
> It can also useful to be to compute the partitions one would like to scan via 
> a subquery (where p in select ... from ...).
> The resulting joins in hive require a full table scan of the large table 
> though, because partition pruning takes place before the corresponding values 
> are known.
> On Tez it's relatively straight forward to send the values needed to prune to 
> the application master - where splits are generated and tasks are submitted. 
> Using these values we can strip out any unneeded partitions dynamically, 
> while the query is running.
> The approach is straight forward:
> - Insert synthetic conditions for each join representing "x in (keys of other 
> side in join)"
> - This conditions will be pushed as far down as possible
> - If the condition hits a table scan and the column involved is a partition 
> column:
>    - Setup Operator to send key events to AM
> - else:
>    - Remove synthetic predicate
> Add  these properties :
> ||Property||Default Value||
> |{{hive.tez.dynamic.partition.pruning}}|true|
> |{{hive.tez.dynamic.partition.pruning.max.event.size}}|1*1024*1024L|
> |{{hive.tez.dynamic.parition.pruning.max.data.size}}|100*1024*1024L|



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to