[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121935#comment-16121935
 ] 

Michael Han commented on ZOOKEEPER-2867:
----------------------------------------

bq. That Kafka broker went down around Aug 2, 23:16:11. The expectation is that 
its ZK session will be closed and the registered ephemeral node will be removed.

The expectation will be correct if this Kafka broker really died. But if it was 
a transient interruption, i.e. network issues between that Kafka broker and 
ZooKeeper, then the session will be renewed and ephemeral nodes appertain to 
that session will be intact. From the original log posted, the session 
{{0x25cd1e82c110001}} was renewed (though the timestamp of the log does not 
match {{Aug 2, 23:16:11}} so probably we are not talking about same event. In 
any case, do we have any evidence that indicates the broker in question really 
died?

bq. the following entry doesn't necessarily mean that the ephemeral node is 
actually removed since it could be restored.

Correct, I think this log does not imply the final quorum state W.R.T. the 
ephemeral nodes, because the logging here was done before a consensus regarding 
the ephemeral removal proposal was reached. When a server thinks a client is 
dead and initiates the session closing, it will start removing the ephemerals 
from its data tree while waiting for the votes of the ephemeral removal 
proposal, so the ephemeral nodes removal here does not have transactional 
semantic because it does not consider the proposal results.

bq. Is there any logging indicating the restore?

When log level is set to DEBUG, {{FinalRequestPrecessor}} will log all requests 
it processes, including the ephemeral nodes recreate (which is just normal 
ephemeral node create since it has no notion of a state). You can probably 
search {{type:create}} in logs.

The restore will only happen when a server restarts, or rejoin quorum, and 
finds out it has out dated state with rest of quorum. Otherwise, the ephemeral 
nodes will either be removed already at quorum level, or not removed at all (on 
this specific server) :). In both cases (restarts/rejoin quorum), the server 
will recover the expected state from rest of quorum by replaying transactions 
logs. 

> an expired ZK session can be re-established
> -------------------------------------------
>
>                 Key: ZOOKEEPER-2867
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2867
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.10
>            Reporter: Jun Rao
>         Attachments: zk.0.formatted, zk.1.formatted
>
>
> Not sure if this is a real bug, but I found an instance when a ZK client 
> seems to be able to renew a session already expired by the ZK server.
> From ZK server log, session 25cd1e82c110001 was expired at 22:04:39.
> {code:java}
> June 27th 2017, 22:04:39.000  INFO    
> org.apache.zookeeper.server.ZooKeeperServer     Expiring session 
> 0x25cd1e82c110001, timeout of 12000ms exceeded
> June 27th 2017, 22:04:39.001  DEBUG   
> org.apache.zookeeper.server.quorum.Leader       Proposing:: 
> sessionid:0x25cd1e82c110001 type:closeSession cxid:0x0 zxid:0x200000fc4 
> txntype:-11 reqpath:n/a
> June 27th 2017, 22:04:39.001  INFO    
> org.apache.zookeeper.server.PrepRequestProcessor        Processed session 
> termination for sessionid: 0x25cd1e82c110001
> June 27th 2017, 22:04:39.001  DEBUG   
> org.apache.zookeeper.server.quorum.CommitProcessor      Processing request:: 
> sessionid:0x25cd1e82c110001 type:closeSession cxid:0x0 zxid:0x200000fc4 
> txntype:-11 reqpath:n/a
> June 27th 2017, 22:05:20.324  INFO    
> org.apache.zookeeper.server.quorum.Learner      Revalidating client: 
> 0x25cd1e82c110001
> June 27th 2017, 22:05:20.324  INFO    
> org.apache.zookeeper.server.ZooKeeperServer     Client attempting to renew 
> session 0x25cd1e82c110001 at /100.96.5.6:47618
> June 27th 2017, 22:05:20.325  INFO    
> org.apache.zookeeper.server.ZooKeeperServer     Established session 
> 0x25cd1e82c110001 with negotiated timeout 12000 for client /100.96.5.6:47618
> {code}
> From ZK client's log, it was able to renew the expired session on 22:05:20.
> {code:java}
> June 27th 2017, 22:05:18.590  INFO    org.apache.zookeeper.ClientCnxn Client 
> session timed out, have not heard from server in 4485ms for sessionid 
> 0x25cd1e82c110001, closing socket connection and attempting reconnect      0
> June 27th 2017, 22:05:18.590  WARN    org.apache.zookeeper.ClientCnxn Client 
> session timed out, have not heard from server in 4485ms for sessionid 
> 0x25cd1e82c110001  0
> June 27th 2017, 22:05:19.325  WARN    org.apache.zookeeper.ClientCnxn SASL 
> configuration failed: javax.security.auth.login.LoginException: No JAAS 
> configuration section named 'Client' was found in specified JAAS 
> configuration file: '/opt/confluent/etc/kafka/server_jaas.conf'. Will 
> continue connection to Zookeeper server without SASL authentication, if 
> Zookeeper server allows it. 0
> June 27th 2017, 22:05:19.326  INFO    org.apache.zookeeper.ClientCnxn Opening 
> socket connection to server 100.65.188.168/100.65.188.168:2181  0
> June 27th 2017, 22:05:20.324  INFO    org.apache.zookeeper.ClientCnxn Socket 
> connection established to 100.65.188.168/100.65.188.168:2181, initiating 
> session 0
> June 27th 2017, 22:05:20.327  INFO    org.apache.zookeeper.ClientCnxn Session 
> establishment complete on server 100.65.188.168/100.65.188.168:2181, 
> sessionid = 0x25cd1e82c110001, negotiated timeout = 12000  0
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to