[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383602#comment-16383602
 ] 

Artyom Kravchenko commented on ZOOKEEPER-2219:
----------------------------------------------

Hi 

I have faced with similar issue. I use:

server: Zookeeper server 3.4.11 (need to make cluster of solr instances), Solr 
Server 7.2.1 

client: CloudSolrClient (org.apache.solr:solr-solrj:7.1.0 and 
org.apache.zookeeper:zookeeper:3.4.10) 

 

After some time of work client falls off.

Server logs:
{code:java}
2018-02-28 15:04:04,131 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - Client 
attempting to establish new session at /10.100.53.1:52830
2018-02-28 15:04:04,133 [myid:2] - INFO  
[CommitProcessor:2:ZooKeeperServer@687] - Established session 0x261c838af810485 
with negotiated timeout 10000 for client /10.100.48.1:52827
2018-02-28 15:04:04,135 [myid:2] - INFO  
[CommitProcessor:2:ZooKeeperServer@687] - Established session 0x261c838af810486 
with negotiated timeout 10000 for client /10.100.80.1:52829
2018-02-28 15:04:04,136 [myid:2] - INFO  
[CommitProcessor:2:ZooKeeperServer@687] - Established session 0x261c838af810487 
with negotiated timeout 10000 for client /10.100.53.1:52830
2018-02-28 15:04:08,085 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted 
socket connection from /127.0.0.1:52832
2018-02-28 15:04:08,085 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@883] - Processing ruok 
command from /127.0.0.1:52832
2018-02-28 15:04:08,086 [myid:2] - INFO  [Thread-281185:NIOServerCnxn@1044] - 
Closed socket connection for client /127.0.0.1:52832 (no session established 
for client)
--
--
        at java.lang.Thread.run(Thread.java:748)
2018-02-28 15:04:23,955 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket 
connection for client /10.100.80.1:52829 which had sessionid 0x261c838af810486
2018-02-28 15:04:23,955 [myid:2] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of 
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x261c838af810487, likely client has closed socket
        at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
        at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
        at java.lang.Thread.run(Thread.java:748)
--
--
        at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
        at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
        at java.lang.Thread.run(Thread.java:748)
2018-02-28 15:04:23,956 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket 
connection for client /10.100.53.1:52830 which had sessionid 0x261c838af810487
2018-02-28 15:04:23,956 [myid:2] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of 
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 
0x261c838af810485, likely client has closed socket
        at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
--
--
2018-02-28 15:04:36,586 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - Client 
attempting to establish new session at /10.100.48.1:52840
2018-02-28 15:04:36,586 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted 
socket connection from /10.100.80.1:52841
2018-02-28 15:04:36,586 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted 
socket connection from /10.100.53.1:52842
2018-02-28 15:04:36,586 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@935] - Client 
attempting to renew session 0x261c838af810487 at /10.100.80.1:52841
2018-02-28 15:04:36,586 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 
0x261c838af810487
2018-02-28 15:04:36,586 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@935] - Client 
attempting to renew session 0x261c838af810486 at /10.100.53.1:52842
2018-02-28 15:04:36,586 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 
0x261c838af810486
--
2018-02-28 15:04:36,586 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@935] - Client 
attempting to renew session 0x261c838af810486 at /10.100.53.1:52842
2018-02-28 15:04:36,586 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client: 
0x261c838af810486
2018-02-28 15:04:36,587 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@680] - Invalid session 
0x261c838af810487 for client /10.100.80.1:52841, probably expired
2018-02-28 15:04:36,587 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@680] - Invalid session 
0x261c838af810486 for client /10.100.53.1:52842, probably expired
--
2018-02-28 15:04:36,587 [myid:2] - INFO  
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@680] - Invalid session 
0x261c838af810486 for client /10.100.53.1:52842, probably expired
2018-02-28 15:04:36,592 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket 
connection for client /10.100.80.1:52841 which had sessionid 0x261c838af810487
2018-02-28 15:04:36,592 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket 
connection for client /10.100.53.1:52842 which had sessionid 0x261c838af810486
2018-02-28 15:04:36,595 [myid:2] - INFO  
[CommitProcessor:2:ZooKeeperServer@687] - Established session 0x261c838af810488 
with negotiated timeout 10000 for client /10.100.48.1:52840
2018-02-28 15:04:38,368 [myid:2] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted 
socket connection from /127.0.0.1:52844{code}
 

Client logs:
{code:java}
Unable to reconnect to ZooKeeper service, session 0x261c838af810487 has expired

{code}
Could anybody provide some update about this issue

 

 

> ZooKeeper server should better handle SessionMovedException.
> ------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2219
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2219
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.5
>            Reporter: zhihai xu
>            Priority: Major
>
> ZooKeeper server should better handle SessionMovedException.
> We hit the SessionMovedException. the following is the reason for the 
> SessionMovedException we find:
> 1. ZK client tried to connect to Leader L. Network was very slow, so before 
> leader processed the request, client disconnected.
> 2. Client then re-connected to Follower F reusing the same session ID. It was 
> successful.
> 3. The request in step 1 went into leader. Leader processed it and 
> invalidated the connection created in step 2. But client didn't know the 
> connection it used is invalidated.
> 4. Client got SessionMovedException when it used the connection invalidated 
> by leader for any ZooKeeper operation.
> The following are logs: c045dkh is the Leader, c470udy is the Follower and 
> the sessionID is 0x14be28f50f4419d.
> 1. ZK client try to initiate session to Leader at 015-03-16 10:59:40,735 and 
> timeout after 10/3 seconds.
> logs from ZK client 
> {code}
> 2015-03-16 10:59:40,078 INFO org.apache.zookeeper.ClientCnxn: Client session 
> timed out, have not heard from server in 6670ms for sessionid 
> 0x14be28f50f4419d, closing socket connection and attempting reconnect
> 015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server c045dkh/?.?.?.67:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to c045dkh/?.?.?.67:2181, initiating session
> 2015-03-16 10:59:44,071 INFO org.apache.zookeeper.ClientCnxn: Client session 
> timed out, have not heard from server in 3336ms for sessionid 
> 0x14be28f50f4419d, closing socket connection and attempting reconnect
> {code}
> 2. ZK client initiated session to Follower successfully at 2015-03-16 
> 10:59:44,688
> logs from ZK client
> {code}
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server c470udy/?.?.?.65:2181. Will not attempt to authenticate 
> using SASL (unknown error)
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to c470udy/?.?.?.65:2181, initiating session
> 2015-03-16 10:59:44,688 INFO org.apache.zookeeper.ClientCnxn: Session 
> establishment complete on server c470udy/?.?.?.65:2181, sessionid = 
> 0x14be28f50f4419d, negotiated timeout = 10000
> {code}
> logs from ZK Follower server
> {code}
> 2015-03-16 10:59:44,673 INFO 
> org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection 
> from /?.?.?.65:42777
> 2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.ZooKeeperServer: 
> Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:42777
> 2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.quorum.Learner: 
> Revalidating client: 0x14be28f50f4419d
> 2015-03-16 10:59:44,675 INFO org.apache.zookeeper.server.ZooKeeperServer: 
> Established session 0x14be28f50f4419d with negotiated timeout 10000 for 
> client /?.?.?.65:42777
> {code}
> 3. At 2015-03-16 10:59:45,668, Leader processed the delayed request which is 
> sent by Client at 2015-03-16 10:59:40,735, right after session was 
> established, it close the socket connection/session because client was 
> already disconnected due to timeout.
> logs from ZK Leader server
> {code}
> 2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: 
> Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:50271
> 2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer: 
> Established session 0x14be28f50f4419d with negotiated timeout 10000 for 
> client /?.?.?.65:50271
> 2015-03-16 10:59:45,670 WARN org.apache.zookeeper.server.NIOServerCnxn: 
> Exception causing close of session 0x14be28f50f4419d due to 
> java.io.IOException: Broken pipe
> 2015-03-16 10:59:45,671 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> Closed socket connection for client /?.?.?.65:50271 which had sessionid 
> 0x14be28f50f4419d
> {code}
> 4. Client got SessionMovedException at 2015-03-16 10:59:45,693
> logs from ZK Leader server
> {code}
> 2015-03-16 10:59:45,693 INFO 
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level 
> KeeperException when processing sessionid:0x14be28f50f4419d type:multi 
> cxid:0x86e3 zxid:0x1c002a4e53 txntype:-1 reqpath:n/a aborting remaining multi 
> ops. Error Path:null Error:KeeperErrorCode = Session moved
> 2015-03-16 10:59:45,695 INFO 
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level 
> KeeperException when processing sessionid:0x14be28f50f4419d type:multi 
> cxid:0x86e5 zxid:0x1c002a4e56 txntype:-1 reqpath:n/a aborting remaining multi 
> ops. Error Path:null Error:KeeperErrorCode = Session moved
> 2015-03-16 10:59:45,700 INFO 
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level 
> KeeperException when processing sessionid:0x14be28f50f4419d type:multi 
> cxid:0x86e7 zxid:0x1c002a4e57 txntype:-1 reqpath:n/a aborting remaining multi 
> ops. Error Path:null Error:KeeperErrorCode = Session moved
> {code}
> 5. At 2015-03-16 10:59:45,710, we close the session 0x14be28f50f4419d but the 
> socket connection between ZK client and ZK Follower is closed at 2015-03-16 
> 10:59:45,715 after session termination.
> logs from ZK Leader server:
> {code}
> 2015-03-16 10:59:45,710 INFO 
> org.apache.zookeeper.server.PrepRequestProcessor: Processed session 
> termination for sessionid: 0x14be28f50f4419d
> {code}
> logs from ZK Follower server:
> {code}
> 2015-03-16 10:59:45,715 INFO org.apache.zookeeper.server.NIOServerCnxn: 
> Closed socket connection for client /?.?.?.65:42777 which had sessionid 
> 0x14be28f50f4419d
> {code}
> It looks like Zk client is out-of-sync with ZK server.
> My question is how ZK client can recover from this error. It looks like the 
> ZK Client won't be disconnected from Follower until session is closed and any 
> ZooKeeper operation will fail with SessionMovedException before session is 
> closed.
> Also since ZK Leader already closed the socket connection/session to ZK 
> Client at step 3, why it still reject the ZooKeeper operation from client 
> with SessionMovedException. Will it be better to endorse the 
> session/connection between ZK client and ZK Follower? This seems like a bug 
> to me. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to