[
https://issues.apache.org/jira/browse/ACCUMULO-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633594#comment-13633594
]
Hudson commented on ACCUMULO-1277:
----------------------------------
Integrated in Accumulo-1.4.x #293 (See
[https://builds.apache.org/job/Accumulo-1.4.x/293/])
ACCUMULO-1277 made master delay deleting lockless tserver nodes in
zookeeper (Revision 1468589)
Result = SUCCESS
kturner :
Files :
* /accumulo/branches/1.4
* /accumulo/branches/1.4/src
* /accumulo/branches/1.4/src/core
* /accumulo/branches/1.4/src/server
* /accumulo/branches/1.4/src/server/src
*
/accumulo/branches/1.4/src/server/src/main/java/org/apache/accumulo/server/master/LiveTServerSet.java
*
/accumulo/branches/1.4/src/server/src/main/java/org/apache/accumulo/server/zookeeper/ZooLock.java
> Race condition between master and tserver when acquiring tserver lock
> ---------------------------------------------------------------------
>
> Key: ACCUMULO-1277
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1277
> Project: Accumulo
> Issue Type: Bug
> Components: master, tserver
> Affects Versions: 1.4.3
> Reporter: Daniel P Truitt
> Assignee: Keith Turner
> Fix For: 1.5.0, 1.4.4
>
>
> When restarting a stopped tserver, the following happens:
> The tserver (in TabletServer.announceExistence()) creates an entry in
> zookeeper at /accumulo/instance-id/tserver/host:port.
> This in turn triggers master to execute the call chain:
> LiveTServerSet.process(WatchedEvent)
> LiveTServerSet.scanServers()
> LiveTServerSet.checkServer(Set<TServerInstance>, Set<TServerInstance>,
> String, String)
> The checkServer() method checks to see if the ZooLock data has been created
> yet (if tserver loses the race, it has not yet been created) causing master
> to then delete the tserver node.
> When the tserver attempts to create the ZooLock, the parent path no longer
> exists and creating the lock fails. Eventually the tserver will time out
> waiting to create the lock, and fail to start.
> This problem is easier to reproduce in a smallish cluster using a single
> zookeeper node, where there is more latency between the tserver and zookeeper
> than there is between the master and zookeeper.
> This behavior was introduced in the fix for ACCUMULO-1049.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira