[
https://issues.apache.org/jira/browse/HIVE-14979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591664#comment-15591664
]
Peter Vary commented on HIVE-14979:
-----------------------------------
I totally agree with you [~sershe]!
Here is what I know at the moment, thanks for the guys who helped out with
extra info:
- There is 1 configuration value for ZooKeeper timeout
(HIVE_ZOOKEEPER_SESSION_TIMEOUT) used by the service discovery and the locks as
well. This is set to 20 minutes by default, and might be overwritten by the
ZooKeeper maxSessionTimeout value to a lower value.
- If the HiveServer2 is shut down with normal methods, then it removes the
ZooKeeper nodes as expected (at least I have yet to find an example to
contradict this)
- If the HiveServer2 dies unexpectedly then ZooKeeper correctly removes the
ephemeral nodes, but only after the session timeout is reached - with default
configuration it could be 20 minutes
- The patch proposes a configuration option which - if enabled - at HiveServer2
startup time will remove the remaining ZooKeeper lock nodes even if the
ZooKeeper session timeout is not reached.
- So far I read a quiet good reason behind the large timeout (see: the comment
by [~thejas], and
http://stackoverflow.com/questions/14275613/concerns-about-zookeepers-lock-recipe).
Session timeout is reliant on ping messages so a long GC or network congestion
could cause session termination. ZooKeeper tries to ping an idle connection
after 1/3 of the timeout, so the longer the timeout, the less probable to have
a session terminated overzealously :).
I do not know enough about the external jobs yet, but I also think the
remaining jobs could be a problem. All-in-all solving them with increased
timeout does not strike me like a good solution: queries in Hive could be huge
and could run for hours/days, so a 20 minutes timeout still not solves the
problem at all. Am I right here, or missing some important points?
Thanks,
Peter
> Removing stale Zookeeper locks at HiveServer2 initialization
> ------------------------------------------------------------
>
> Key: HIVE-14979
> URL: https://issues.apache.org/jira/browse/HIVE-14979
> Project: Hive
> Issue Type: Improvement
> Components: Locking
> Reporter: Peter Vary
> Assignee: Peter Vary
> Attachments: HIVE-14979.3.patch, HIVE-14979.4.patch, HIVE-14979.patch
>
>
> HiveServer2 could use Zookeeper to store token that indicate that particular
> tables are locked with the creation of persistent Zookeeper objects.
> A problem can occur when a HiveServer2 instance creates a lock on a table and
> the HiveServer2 instances crashes ("Out of Memory" for example) and the locks
> are not released in Zookeeper. This lock will then remain until it is
> manually cleared by an admin.
> There should be a way to remove stale locks at HiveServer2 initialization,
> helping the admins life.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)