[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402997#comment-13402997
 ] 

Rakesh R commented on ZOOKEEPER-1496:
-------------------------------------

I've gone through the issue and attaching the analysis. Apologies in advance 
for the length of this comment - I'm trying to cover all the details.

*Environment details:*
ZK-91 -> Leader
ZK-55 -> Follower
ZK-102 -> Follower
ZKClient session timeout -> 5seconds
ZKClient side logic:- What I observed is, zclient is not waiting for any 
SyncConnected event before creating ephemeral znode 'ActiveStandbyElectorLock'. 
Also, on every connectionloss it is creating the ephemeral znode 
'ActiveStandbyElectorLock'.


*Root Cause:* I'm suspecting there is a race condition between session expiry 
and FinalRequestProcessor. I could see, the expiry logic is removing the 
SessionImpl and is raising the 'closesession' request, other side 
FinalRequestProcessor is creating new SessionImpl into the SessionTracker for 
the 'createsession' request.

This case will come only if FinalRequestProcessor has some delay in processing 
the 'createsession' and in between the same session is getting expired.
----

I've done the analysis in perspective of Leader ZooKeeper server(ZK-91) and is 
as follows:

*Stage-1)* Client has sent connection request and the transaction id is 
'0x200000070'
Following log shows, the server has received connection request and after 
successful proposal, send committing request to the Follower. But the Leader 
side the transaction is in progress, as the request not reaches the 
FinalRequestProcessor.
{noformat}
2012-06-26 13:10:12,566 [myid:3] - DEBUG [ProcessThread(sid:3 
cport:-1)::CommitProcessor@171] - Processing request:: 
sessionid:0x1382791d4e50004 type:createSession cxid:0x0 zxid:0x200000070 
txntype:-10 reqpath:n/a

2012-06-26 13:10:13,001 [myid:3] - DEBUG [ProcessThread(sid:3 
cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 
type:createSession cxid:0x0 zxid:0x200000070 txntype:-10 reqpath:n/a

2012-06-26 13:10:13,202 [myid:3] - DEBUG 
[LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: 
sessionid:0x1382791d4e50004 type:createSession cxid:0x0 zxid:0x200000070 
txntype:-10 reqpath:n/a
{noformat}
----

*Stage-2)* Following log shows the Leader has'nt seen any pings from 
0x1382791d4e50004 and expiring the session by creating 'closesession' request.
 If we see the SessionTrackerImpl, it is removing the SessionImpl corresponding 
to the 0x1382791d4e50004 before raising the expiry request. 
{code}
     sessionsById.remove(s.sessionId);
     expirer.expire(s);
{code}
At the same time, zkclient has sent 'create' command to the server - 
0x200000072 transactionid, now the sessionTracker.checkSession(sessionId, 
owner) will not have the SessionImpl and get null then throwing the session 
expiry exception.
{code}
        SessionImpl session = sessionsById.get(sessionId);
        if (session == null || session.isClosing()) {
            throw new KeeperException.SessionExpiredException();
        }
{code}
{noformat}
2012-06-26 13:10:18,653 [myid:3] - INFO  [SessionTracker:ZooKeeperServer@325] - 
Expiring session 0x1382791d4e50004, timeout of 5000ms exceeded
2012-06-26 13:10:18,803 [myid:3] - INFO  [ProcessThread(sid:3 
cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when 
processing sessionid:0x1382791d4e50004 type:create cxid:0x1 zxid:0x200000072 
txntype:-1 reqpath:/hadoop-ha/hacluster/ActiveStandbyElectorLock Error 
Path:null Error:KeeperErrorCode = Session expired
{noformat}
----

*Stage-3)* Following message shows, 'closesession' 0x200000074 has got enough 
Ack and committing. But it didn't reaches the FinalRequestProcessor, so the 
session info will exists in the sessionTracker except the 'SessionImpl' 
object(as session is expired). 
{noformat}
2012-06-26 13:10:18,834 [myid:3] - INFO  [ProcessThread(sid:3 
cport:-1)::PrepRequestProcessor@476] - Processed session termination for 
sessionid: 0x1382791d4e50004
2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 
cport:-1)::CommitProcessor@171] - Processing request:: 
sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 
txntype:-11 reqpath:n/a
{noformat}
----

*Stage-4)* Following log shows, 'createSession' 0x200000070 has reached 
FinalRequestProcessor and same time received a 'create' request from the client.
Now, the FinalRequestProcessor is processing the transaction and adding the 
SessionImpl back to the SessionTrackerImpl. The 'create' request is seeing this 
new SessionImpl and will continue successfully to next processor.
{code}
FinalRequestProcessor.java
rc = zks.processTxn(hdr, txn);       
{code}

ZooKeeperServer.java
{code}
    public ProcessTxnResult processTxn(TxnHeader hdr, Record txn) {
        ProcessTxnResult rc;
        int opCode = hdr.getType();
        long sessionId = hdr.getClientId();
        rc = getZKDatabase().processTxn(hdr, txn);
        if (opCode == OpCode.createSession) {
            if (txn instanceof CreateSessionTxn) {
                CreateSessionTxn cst = (CreateSessionTxn) txn;
                sessionTracker.addSession(sessionId, cst
                        .getTimeOut());
            } else {
                LOG.warn("*****>>>>> Got "
                        + txn.getClass() + " "
                        + txn.toString());
            }
        } else if (opCode == OpCode.closeSession) {
            sessionTracker.removeSession(sessionId);
        }
        return rc;
    }
{code}

{noformat}
2012-06-26 13:10:19,886 [myid:3] - DEBUG 
[CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
sessionid:0x1382791d4e50004 type:createSession cxid:0x0 zxid:0x200000070 
txntype:-10 reqpath:n/a
{noformat}
----

*Stage-5)* 'closesession' 0x200000074 request has come to the 
FinalRequestProcessor and do the session closure
{noformat}
2012-06-26 13:10:20,608 [myid:3] - DEBUG 
[CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 
txntype:-11 reqpath:n/a
{noformat}
----

*Stage-6)* Now the 'create' 0x200000075 command, reaches in 
FinalRequestProcessor and resulting the orphan ephmeral znode.
{noformat}
2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 
cport:-1)::CommitProcessor@171] - Processing request:: 
sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 
reqpath:n/a
2012-06-26 13:10:20,278 [myid:3] - DEBUG 
[LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: 
sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 
reqpath:n/a
2012-06-26 13:10:20,752 [myid:3] - DEBUG 
[CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 
reqpath:n/a
{noformat}


-Rakesh
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>         Attachments: Logs.rar
>
>
> In one of the tests we performed, came across a case where the ephemeral node 
> was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server 
> xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get 
> /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and 
> later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
> "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 
> 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 
> cport:-1)::CommitProcessor@171] - Processing request:: 
> sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 
> txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 
> cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 
> type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG 
> [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
> request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 
> zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG 
> [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
> sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 
> txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E 
> "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 
> 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 
> cport:-1)::CommitProcessor@171] - Processing request:: 
> sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 
> reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 
> cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create 
> cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG 
> [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing 
> request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 
> txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG 
> [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: 
> sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 
> reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover 
> enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this 
> failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to