[
https://issues.apache.org/jira/browse/HIVE-21354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080592#comment-17080592
]
Peter Vary commented on HIVE-21354:
-----------------------------------
[~belugabehr]: I do not get it. Even stranger:
{code}
explain locks select * from web_logs where `date`='2015-11-18'
Explain
LOCK INFORMATION:
default.web_logs -> SHARED_READ
default.web_logs.date=2015-11-18 -> SHARED_READ
{code}
Seems like the assumption is that we only check for the exact matches on locks.
We should double check that we really prevent getting a shared lock on a
partition if some other query has an exclusive lock on the table. [~dkuzmenko]
can help us here :)
Just a fun fact:
{code}
explain locks alter table web_logs drop partition(`date`='2015-11-18')
LOCK INFORMATION:
default.web_logs.date=2015-11-18 -> EXCLUSIVE
{code}
This might merit another Jira (or do it here?): do not request unnecessary
locks (why do we request full table lock with a select?). In the current state
we would prevent dropping a partition even if it is not used in the query.
> Lock The Entire Table If Majority Of Partitions Are Locked
> ----------------------------------------------------------
>
> Key: HIVE-21354
> URL: https://issues.apache.org/jira/browse/HIVE-21354
> Project: Hive
> Issue Type: Improvement
> Components: HiveServer2
> Affects Versions: 4.0.0, 3.2.0
> Reporter: David Mollitor
> Assignee: David Mollitor
> Priority: Major
>
> One of the bottlenecks of any Hive query is the ZooKeeper locking mechanism.
> When a Hive query interacts with a table which has a lot of partitions, this
> may put a lot of stress on the ZK system.
> Please add a heuristic that works like this:
> # Count the number of partitions that a query is required to lock
> # Obtain the total number of partitions in the table
> # If the number of partitions accessed by the query is greater than or equal
> to half the total number of partitions, simply create one ZNode lock at the
> table level.
> This would improve performance of many queries, but in particular, a {{select
> count(1) from table}} ... or ... {{select * from table limit 5}} where the
> table has many partitions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)