[
https://issues.apache.org/jira/browse/ZOOKEEPER-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383602#comment-16383602
]
Artyom Kravchenko commented on ZOOKEEPER-2219:
----------------------------------------------
Hi
I have faced with similar issue. I use:
server: Zookeeper server 3.4.11 (need to make cluster of solr instances), Solr
Server 7.2.1
client: CloudSolrClient (org.apache.solr:solr-solrj:7.1.0 and
org.apache.zookeeper:zookeeper:3.4.10)
After some time of work client falls off.
Server logs:
{code:java}
2018-02-28 15:04:04,131 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - Client
attempting to establish new session at /10.100.53.1:52830
2018-02-28 15:04:04,133 [myid:2] - INFO
[CommitProcessor:2:ZooKeeperServer@687] - Established session 0x261c838af810485
with negotiated timeout 10000 for client /10.100.48.1:52827
2018-02-28 15:04:04,135 [myid:2] - INFO
[CommitProcessor:2:ZooKeeperServer@687] - Established session 0x261c838af810486
with negotiated timeout 10000 for client /10.100.80.1:52829
2018-02-28 15:04:04,136 [myid:2] - INFO
[CommitProcessor:2:ZooKeeperServer@687] - Established session 0x261c838af810487
with negotiated timeout 10000 for client /10.100.53.1:52830
2018-02-28 15:04:08,085 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted
socket connection from /127.0.0.1:52832
2018-02-28 15:04:08,085 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@883] - Processing ruok
command from /127.0.0.1:52832
2018-02-28 15:04:08,086 [myid:2] - INFO [Thread-281185:NIOServerCnxn@1044] -
Closed socket connection for client /127.0.0.1:52832 (no session established
for client)
--
--
at java.lang.Thread.run(Thread.java:748)
2018-02-28 15:04:23,955 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket
connection for client /10.100.80.1:52829 which had sessionid 0x261c838af810486
2018-02-28 15:04:23,955 [myid:2] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of
stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x261c838af810487, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:748)
--
--
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:748)
2018-02-28 15:04:23,956 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket
connection for client /10.100.53.1:52830 which had sessionid 0x261c838af810487
2018-02-28 15:04:23,956 [myid:2] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@368] - caught end of
stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x261c838af810485, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)
--
--
2018-02-28 15:04:36,586 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - Client
attempting to establish new session at /10.100.48.1:52840
2018-02-28 15:04:36,586 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted
socket connection from /10.100.80.1:52841
2018-02-28 15:04:36,586 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted
socket connection from /10.100.53.1:52842
2018-02-28 15:04:36,586 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@935] - Client
attempting to renew session 0x261c838af810487 at /10.100.80.1:52841
2018-02-28 15:04:36,586 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client:
0x261c838af810487
2018-02-28 15:04:36,586 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@935] - Client
attempting to renew session 0x261c838af810486 at /10.100.53.1:52842
2018-02-28 15:04:36,586 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client:
0x261c838af810486
--
2018-02-28 15:04:36,586 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@935] - Client
attempting to renew session 0x261c838af810486 at /10.100.53.1:52842
2018-02-28 15:04:36,586 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@108] - Revalidating client:
0x261c838af810486
2018-02-28 15:04:36,587 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@680] - Invalid session
0x261c838af810487 for client /10.100.80.1:52841, probably expired
2018-02-28 15:04:36,587 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@680] - Invalid session
0x261c838af810486 for client /10.100.53.1:52842, probably expired
--
2018-02-28 15:04:36,587 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@680] - Invalid session
0x261c838af810486 for client /10.100.53.1:52842, probably expired
2018-02-28 15:04:36,592 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket
connection for client /10.100.80.1:52841 which had sessionid 0x261c838af810487
2018-02-28 15:04:36,592 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1044] - Closed socket
connection for client /10.100.53.1:52842 which had sessionid 0x261c838af810486
2018-02-28 15:04:36,595 [myid:2] - INFO
[CommitProcessor:2:ZooKeeperServer@687] - Established session 0x261c838af810488
with negotiated timeout 10000 for client /10.100.48.1:52840
2018-02-28 15:04:38,368 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted
socket connection from /127.0.0.1:52844{code}
Client logs:
{code:java}
Unable to reconnect to ZooKeeper service, session 0x261c838af810487 has expired
{code}
Could anybody provide some update about this issue
> ZooKeeper server should better handle SessionMovedException.
> ------------------------------------------------------------
>
> Key: ZOOKEEPER-2219
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2219
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.4.5
> Reporter: zhihai xu
> Priority: Major
>
> ZooKeeper server should better handle SessionMovedException.
> We hit the SessionMovedException. the following is the reason for the
> SessionMovedException we find:
> 1. ZK client tried to connect to Leader L. Network was very slow, so before
> leader processed the request, client disconnected.
> 2. Client then re-connected to Follower F reusing the same session ID. It was
> successful.
> 3. The request in step 1 went into leader. Leader processed it and
> invalidated the connection created in step 2. But client didn't know the
> connection it used is invalidated.
> 4. Client got SessionMovedException when it used the connection invalidated
> by leader for any ZooKeeper operation.
> The following are logs: c045dkh is the Leader, c470udy is the Follower and
> the sessionID is 0x14be28f50f4419d.
> 1. ZK client try to initiate session to Leader at 015-03-16 10:59:40,735 and
> timeout after 10/3 seconds.
> logs from ZK client
> {code}
> 2015-03-16 10:59:40,078 INFO org.apache.zookeeper.ClientCnxn: Client session
> timed out, have not heard from server in 6670ms for sessionid
> 0x14be28f50f4419d, closing socket connection and attempting reconnect
> 015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Opening socket
> connection to server c045dkh/?.?.?.67:2181. Will not attempt to authenticate
> using SASL (unknown error)
> 2015-03-16 10:59:40,735 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to c045dkh/?.?.?.67:2181, initiating session
> 2015-03-16 10:59:44,071 INFO org.apache.zookeeper.ClientCnxn: Client session
> timed out, have not heard from server in 3336ms for sessionid
> 0x14be28f50f4419d, closing socket connection and attempting reconnect
> {code}
> 2. ZK client initiated session to Follower successfully at 2015-03-16
> 10:59:44,688
> logs from ZK client
> {code}
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Opening socket
> connection to server c470udy/?.?.?.65:2181. Will not attempt to authenticate
> using SASL (unknown error)
> 2015-03-16 10:59:44,673 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to c470udy/?.?.?.65:2181, initiating session
> 2015-03-16 10:59:44,688 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server c470udy/?.?.?.65:2181, sessionid =
> 0x14be28f50f4419d, negotiated timeout = 10000
> {code}
> logs from ZK Follower server
> {code}
> 2015-03-16 10:59:44,673 INFO
> org.apache.zookeeper.server.NIOServerCnxnFactory: Accepted socket connection
> from /?.?.?.65:42777
> 2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.ZooKeeperServer:
> Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:42777
> 2015-03-16 10:59:44,674 INFO org.apache.zookeeper.server.quorum.Learner:
> Revalidating client: 0x14be28f50f4419d
> 2015-03-16 10:59:44,675 INFO org.apache.zookeeper.server.ZooKeeperServer:
> Established session 0x14be28f50f4419d with negotiated timeout 10000 for
> client /?.?.?.65:42777
> {code}
> 3. At 2015-03-16 10:59:45,668, Leader processed the delayed request which is
> sent by Client at 2015-03-16 10:59:40,735, right after session was
> established, it close the socket connection/session because client was
> already disconnected due to timeout.
> logs from ZK Leader server
> {code}
> 2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer:
> Client attempting to renew session 0x14be28f50f4419d at /?.?.?.65:50271
> 2015-03-16 10:59:45,668 INFO org.apache.zookeeper.server.ZooKeeperServer:
> Established session 0x14be28f50f4419d with negotiated timeout 10000 for
> client /?.?.?.65:50271
> 2015-03-16 10:59:45,670 WARN org.apache.zookeeper.server.NIOServerCnxn:
> Exception causing close of session 0x14be28f50f4419d due to
> java.io.IOException: Broken pipe
> 2015-03-16 10:59:45,671 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Closed socket connection for client /?.?.?.65:50271 which had sessionid
> 0x14be28f50f4419d
> {code}
> 4. Client got SessionMovedException at 2015-03-16 10:59:45,693
> logs from ZK Leader server
> {code}
> 2015-03-16 10:59:45,693 INFO
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x14be28f50f4419d type:multi
> cxid:0x86e3 zxid:0x1c002a4e53 txntype:-1 reqpath:n/a aborting remaining multi
> ops. Error Path:null Error:KeeperErrorCode = Session moved
> 2015-03-16 10:59:45,695 INFO
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x14be28f50f4419d type:multi
> cxid:0x86e5 zxid:0x1c002a4e56 txntype:-1 reqpath:n/a aborting remaining multi
> ops. Error Path:null Error:KeeperErrorCode = Session moved
> 2015-03-16 10:59:45,700 INFO
> org.apache.zookeeper.server.PrepRequestProcessor: Got user-level
> KeeperException when processing sessionid:0x14be28f50f4419d type:multi
> cxid:0x86e7 zxid:0x1c002a4e57 txntype:-1 reqpath:n/a aborting remaining multi
> ops. Error Path:null Error:KeeperErrorCode = Session moved
> {code}
> 5. At 2015-03-16 10:59:45,710, we close the session 0x14be28f50f4419d but the
> socket connection between ZK client and ZK Follower is closed at 2015-03-16
> 10:59:45,715 after session termination.
> logs from ZK Leader server:
> {code}
> 2015-03-16 10:59:45,710 INFO
> org.apache.zookeeper.server.PrepRequestProcessor: Processed session
> termination for sessionid: 0x14be28f50f4419d
> {code}
> logs from ZK Follower server:
> {code}
> 2015-03-16 10:59:45,715 INFO org.apache.zookeeper.server.NIOServerCnxn:
> Closed socket connection for client /?.?.?.65:42777 which had sessionid
> 0x14be28f50f4419d
> {code}
> It looks like Zk client is out-of-sync with ZK server.
> My question is how ZK client can recover from this error. It looks like the
> ZK Client won't be disconnected from Follower until session is closed and any
> ZooKeeper operation will fail with SessionMovedException before session is
> closed.
> Also since ZK Leader already closed the socket connection/session to ZK
> Client at step 3, why it still reject the ZooKeeper operation from client
> with SessionMovedException. Will it be better to endorse the
> session/connection between ZK client and ZK Follower? This seems like a bug
> to me.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)