[jira] [Comment Edited] (HIVE-14979) Removing stale Zookeeper locks at HiveServer2 initialization

Sergey Shelukhin (JIRA) Wed, 19 Oct 2016 11:33:59 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15589471#comment-15589471
 ]


Sergey Shelukhin edited comment on HIVE-14979 at 10/19/16 6:32 PM:
-------------------------------------------------------------------

Hmm... sorry, I still don't quite understand the problem.

TL;DR the patch makes sense if it is intended to work around some network 
timeouts, or ZK not deleting nodes the way we expect. Otherwise I think we need 
to make sure it's compatible with timeout logic and/or just use ZK expiration.

TL:
Do the locks in ZK already expire at some point after HS2 dies? 
If the locks don't expire, we should make them expire as per below ;)
If they do...
>From my understanding, ZK cleans up ephemeral nodes immediately when the 
>process goes down in normal case (based on the connection breaking), 
>regardless of the timeout set for session (that is more of a network timeout 
>and would result in nodes being cleaned up if the connection doesn't 
>immediately break or in other "abnormal" cases). 
Is the timeout we add some additional logical timeout on top of normal cleanup, 
so that even when HS2 dies and the connection is broken, ZK doesn't clean up 
the nodes for some time after the disconnect?

If yes, and we set a large timeout for a reason, we should not clean them up 
before timeout. The reason for a large timeout could be that the locks are 
taken for external jobs that don't die immediately (or at all?) when HS2 dies.
If yes, and we set a large timeout for no good reason (=> we believe we can 
clean them up during startup, as we do in the patch), we should also reduce the 
timeout (or remove it and use the default).







was (Author: sershe):
Hmm... sorry, I still don't quite understand the problem.

TL;DR the patch makes sense if it is to work around some network timeouts, or 
ZK not deleting nodes the way we expect. Otherwise I think we need to make sure 
it's compatible with timeout logic and/or just use ZK expiration.

TL:
Do the locks in ZK already expire at some point after HS2 dies? 
If the locks don't expire, we should make them expire as per below ;)
If they do...
>From my understanding, ZK cleans up ephemeral nodes immediately when the 
>process goes down in normal case (based on the connection breaking), 
>regardless of the timeout set for session (that is more of a network timeout 
>and would result in nodes being cleaned up if the connection doesn't 
>immediately break or in other "abnormal" cases). 
Is the timeout we add some additional logical timeout on top of normal cleanup, 
so that even when HS2 dies and the connection is broken, ZK doesn't clean up 
the nodes for some time after the disconnect?

If yes, and we set a large timeout for a reason, we should not clean them up 
before timeout. The reason for a large timeout could be that the locks are 
taken for external jobs that don't die immediately (or at all?) when HS2 dies.
If yes, and we set a large timeout for no good reason (=> we believe we can 
clean them up during startup, as we do in the patch), we should also reduce the 
timeout (or remove it and use the default).






> Removing stale Zookeeper locks at HiveServer2 initialization
> ------------------------------------------------------------
>
>                 Key: HIVE-14979
>                 URL: https://issues.apache.org/jira/browse/HIVE-14979
>             Project: Hive
>          Issue Type: Improvement
>          Components: Locking
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>         Attachments: HIVE-14979.3.patch, HIVE-14979.4.patch, HIVE-14979.patch
>
>
> HiveServer2 could use Zookeeper to store token that indicate that particular 
> tables are locked with the creation of persistent Zookeeper objects. 
> A problem can occur when a HiveServer2 instance creates a lock on a table and 
> the HiveServer2 instances crashes ("Out of Memory" for example) and the locks 
> are not released in Zookeeper. This lock will then remain until it is 
> manually cleared by an admin.
> There should be a way to remove stale locks at HiveServer2 initialization, 
> helping the admins life.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-14979) Removing stale Zookeeper locks at HiveServer2 initialization

Reply via email to