[ 
https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328764#comment-15328764
 ] 

Mohit Sabharwal commented on HIVE-13884:
----------------------------------------

Since we are moving the functionality from driver to HMS, should we deprecate 
{{hive.limit.query.max.table.partition}} and introduce a new config called 
{{hive.metastore.retrieve.max.partitions}} ?

All metastore configs have "hive.metastore" prefix. 

Otherwise:
1) The change is backward incompatible for existing users that
are setting this config at HS2 level and are now expected to set it
at HMS level to get the same functionality.
2) Name would be confusing.

We could do the following:
1) Mark {{hive.limit.query.max.table.partition}} as deprecated in HiveConf and 
suggest that user move to {{hive.metastore.retrieve.max.partitions}} at HMS
level.
2) Do not remove current functionality associated with 
{{hive.limit.query.max.table.partition}} in PartitionPruner.
It does do what the description promises - i.e. fail the query before execution 
stage if number of 
partitions associated with any scan operator exceed configured value.
3) Add new config {{hive.metastore.retrieve.max.partitions}} to configure 
functionality in this patch.

Makes sense ?

> Disallow queries fetching more than a configured number of partitions in 
> PartitionPruner
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-13884
>                 URL: https://issues.apache.org/jira/browse/HIVE-13884
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Mohit Sabharwal
>            Assignee: Sergio Peña
>         Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, 
> HIVE-13884.3.patch
>
>
> Currently the PartitionPruner requests either all partitions or partitions 
> based on filter expression. In either scenarios, if the number of partitions 
> accessed is large there can be significant memory pressure at the HMS server 
> end.
> We already have a config {{hive.limit.query.max.table.partition}} that 
> enforces limits on number of partitions that may be scanned per operator. But 
> this check happens after the PartitionPruner has already fetched all 
> partitions.
> We should add an option at PartitionPruner level to disallow queries that 
> attempt to access number of partitions beyond a configurable limit.
> Note that {{hive.mapred.mode=strict}} disallow queries without a partition 
> filter in PartitionPruner, but this check accepts any query with a pruning 
> condition, even if partitions fetched are large. In multi-tenant 
> environments, admins could use more control w.r.t. number of partitions 
> allowed based on HMS memory capacity.
> One option is to have PartitionPruner first fetch the partition names 
> (instead of partition specs) and throw an exception if number of partitions 
> exceeds the configured value. Otherwise, fetch the partition specs.
> Looks like the existing {{listPartitionNames}} call could be used if extended 
> to take partition filter expressions like {{getPartitionsByExpr}} call does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to