[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916358#comment-13916358
 ] 

Gunther Hagleitner commented on HIVE-6492:
------------------------------------------

Thanks, Selina. Just trying to understand the requirements to see what's the 
best way to get this in.

One question is whether you can deploy different configs in these scenarios. 
E.g: use a different site file is someone is starting hive on the console v 
tools. Or use an alias to add a --hiveconf on the node where users start hive. 
You're trying to protect the cluster from large jobs - in your case you seem to 
want to turn this on for certain interfaces and off for others, but for other 
deployments that might not make much sense (the interface (ODBC/JDBC/CLI) 
doesn't say if it's a human, tool, etc).

But specifically:

1) What's "small"? Sounds like if it's a query doesn't submit a job you want to 
let it go through? Or only if there's an explicit limit clause?
2) That's the same as 1 - if you just check for "no job started"
3) Aggregation on partition key right now will scan the entire table in a 
massive map-red job. Definitely something that should be fixed - but there's no 
optimization for that yet afaik. Allowing this query seems to defeat the 
purpose of the this flag doesn't it? Seems like again you just want to check 
for "no job started".

With that - it would make sense to update/extend the hive.mapred.mode variable 
to allow for queries that don't actually start a job (and allow jobs only with 
explicit partition pruning). That change + different config for different 
interfaces you should get all that you want and would be simpler. Correct?

> limit partition number involved in a table scan
> -----------------------------------------------
>
>                 Key: HIVE-6492
>                 URL: https://issues.apache.org/jira/browse/HIVE-6492
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.12.0
>            Reporter: Selina Zhang
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6492.1.patch.txt
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> To protect the cluster, a new configure variable 
> "hive.limit.query.max.table.partition" is added to hive configuration to
> limit the table partitions involved in a table scan. 
> The default value will be set to -1 which means there is no limit by default. 
> This variable will not affect "metadata only" query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to