[
https://issues.apache.org/jira/browse/ZOOKEEPER-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gangadhar updated ZOOKEEPER-3906:
---------------------------------
Description:
*Issue*: Data Inconsistency Between Zookeeper Leader and zookeeper Followers.
zookeeper followers and zookeeper leaders have other information. We try to
delete the information from the follower's, but information not present in
zookeeper leader, it's throwing error like *Node does not exist:*
*Expected behaviour:* Data consistency between zookeeper leader and Zookeeper
followers should be same.
Steps followed as part of troubleshooting:
We have 5 zookeepers in clusters.
*Step1:* verified all zookeepers are following the leader or not?. As per
below information its following all 4 zookeepers to zookeeper leader
zk_version 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built on
02/11/2020 11:30 GMT
zk_avg_latency 0
zk_max_latency 823
zk_min_latency 0
zk_packets_received 30214264
zk_packets_sent 32424272
zk_num_alive_connections 7
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count 75190
zk_watch_count 21394
zk_ephemerals_count 793
zk_approximate_data_size 24706628
zk_open_file_descriptor_count 281
zk_max_file_descriptor_count 4096
zk_followers 4
zk_synced_followers 4
zk_pending_syncs 0
zk_last_proposal_size 166
zk_max_proposal_size 121947
zk_min_proposal_size 32
*Step 2:* Verified znode in all the zookeepers , but we are not getting same
information from zookeeper leader and followers.
*Step 3:* Try to delete the Zookeeper node and received below error. Also, we
are suspecting that when trying to delete the info of znode, it's trying to
reach zookeeper leader and throwing *Node does not exist* error.
*Error:*
14:04:54.769 [main] INFO org.apache.zookeeper.ClientCnxnSocket -
jute.maxbuffer value is 10485760 Bytes
14:04:54.775 [main] INFO org.apache.zookeeper.ClientCnxn -
zookeeper.request.timeout value is 0. feature enabled=
14:04:54.824 [main-SendThread(11.111.226.146:2181)] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to server
11-111-226-146.ebiz.verizon.com/11.111.226.146:2181. Will not attempt to
authenticate using SASL (unknown error)
14:04:54.831 [main-SendThread(11.111.226.146:2181)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established, initiating
session, client: /11.111.225.75:38804, server:
11-111-226-146.ebiz.verizon.com/11.111.20.146:2181
14:04:54.835 [main-SendThread(11.111.226.146:2181)] INFO
org.apache.zookeeper.ClientCnxn - Session establishment complete on server
11-111-226-146.ebiz.verizon.com/11.111.226.146:2181, sessionid =
0x500001bbbeb0651, negotiated timeout = 20000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
*Node does not exist: /namespace/$tenant/$Namespace/$zk-path*
was:
Issue: Data Inconsistency Between Zookeeper Leader and zookeeper Followers.
When we try to do the topic lookup for one of the topics I got broker not part
of the cluster and verified below things as part of troubleshooting.
Steps followed as part of troubleshooting:
We have 5 zookeeper cluster.
*Step1:* verified all zookeepers are following the leader or not?. As per
below information its following all 4 zookeepers to zookeeper leader
zk_version 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built on
02/11/2020 11:30 GMT
zk_avg_latency 0
zk_max_latency 823
zk_min_latency 0
zk_packets_received 30214264
zk_packets_sent 32424272
zk_num_alive_connections 7
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count 75190
zk_watch_count 21394
zk_ephemerals_count 793
zk_approximate_data_size 24706628
zk_open_file_descriptor_count 281
zk_max_file_descriptor_count 4096
zk_followers 4
zk_synced_followers 4
zk_pending_syncs 0
zk_last_proposal_size 166
zk_max_proposal_size 121947
zk_min_proposal_size 32
*Step 2:* Verified namespace bundle in all the zookeepers using the below
command. We have received information from all zookeepers. except for Leader
zookeeper.
./pulsar zookeeper-shell get /namespace/$tenant/$Namespace/$Bubdle
*Step 3:* Try to delete the Namespace/$Bubdle to own the topic to another
broker.
./pulsar zookeeper-shell deleteall /namespace/$tenant/$Namespace/$Bubdle
*Error:*
14:04:54.769 [main] INFO org.apache.zookeeper.ClientCnxnSocket -
jute.maxbuffer value is 10485760 Bytes
14:04:54.775 [main] INFO org.apache.zookeeper.ClientCnxn -
zookeeper.request.timeout value is 0. feature enabled=
14:04:54.824 [main-SendThread(11.111.226.146:2181)] INFO
org.apache.zookeeper.ClientCnxn - Opening socket connection to server
11-111-226-146.ebiz.verizon.com/11.111.226.146:2181. Will not attempt to
authenticate using SASL (unknown error)
14:04:54.831 [main-SendThread(11.111.226.146:2181)] INFO
org.apache.zookeeper.ClientCnxn - Socket connection established, initiating
session, client: /11.111.225.75:38804, server:
11-111-226-146.ebiz.verizon.com/11.111.20.146:2181
14:04:54.835 [main-SendThread(11.111.226.146:2181)] INFO
org.apache.zookeeper.ClientCnxn - Session establishment complete on server
11-111-226-146.ebiz.verizon.com/11.111.226.146:2181, sessionid =
0x500001bbbeb0651, negotiated timeout = 20000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
*Node does not exist: /namespace/$tenant/$Namespace/$Bubdle*
> Data Inconsistency Between Zookeeper Leader and zookeeper Followers
> -------------------------------------------------------------------
>
> Key: ZOOKEEPER-3906
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3906
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.5.7
> Reporter: Gangadhar
> Priority: Major
>
> *Issue*: Data Inconsistency Between Zookeeper Leader and zookeeper Followers.
> zookeeper followers and zookeeper leaders have other information. We try to
> delete the information from the follower's, but information not present in
> zookeeper leader, it's throwing error like *Node does not exist:*
> *Expected behaviour:* Data consistency between zookeeper leader and Zookeeper
> followers should be same.
>
> Steps followed as part of troubleshooting:
> We have 5 zookeepers in clusters.
> *Step1:* verified all zookeepers are following the leader or not?. As per
> below information its following all 4 zookeepers to zookeeper leader
> zk_version 3.5.7-f0fdd52973d373ffd9c86b81d99842dc2c7f660e, built on
> 02/11/2020 11:30 GMT
> zk_avg_latency 0
> zk_max_latency 823
> zk_min_latency 0
> zk_packets_received 30214264
> zk_packets_sent 32424272
> zk_num_alive_connections 7
> zk_outstanding_requests 0
> zk_server_state leader
> zk_znode_count 75190
> zk_watch_count 21394
> zk_ephemerals_count 793
> zk_approximate_data_size 24706628
> zk_open_file_descriptor_count 281
> zk_max_file_descriptor_count 4096
> zk_followers 4
> zk_synced_followers 4
> zk_pending_syncs 0
> zk_last_proposal_size 166
> zk_max_proposal_size 121947
> zk_min_proposal_size 32
> *Step 2:* Verified znode in all the zookeepers , but we are not getting same
> information from zookeeper leader and followers.
> *Step 3:* Try to delete the Zookeeper node and received below error. Also, we
> are suspecting that when trying to delete the info of znode, it's trying to
> reach zookeeper leader and throwing *Node does not exist* error.
> *Error:*
> 14:04:54.769 [main] INFO org.apache.zookeeper.ClientCnxnSocket -
> jute.maxbuffer value is 10485760 Bytes
> 14:04:54.775 [main] INFO org.apache.zookeeper.ClientCnxn -
> zookeeper.request.timeout value is 0. feature enabled=
> 14:04:54.824 [main-SendThread(11.111.226.146:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> 11-111-226-146.ebiz.verizon.com/11.111.226.146:2181. Will not attempt to
> authenticate using SASL (unknown error)
> 14:04:54.831 [main-SendThread(11.111.226.146:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Socket connection established, initiating
> session, client: /11.111.225.75:38804, server:
> 11-111-226-146.ebiz.verizon.com/11.111.20.146:2181
> 14:04:54.835 [main-SendThread(11.111.226.146:2181)] INFO
> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> 11-111-226-146.ebiz.verizon.com/11.111.226.146:2181, sessionid =
> 0x500001bbbeb0651, negotiated timeout = 20000
> WATCHER::
> WatchedEvent state:SyncConnected type:None path:null
> *Node does not exist: /namespace/$tenant/$Namespace/$zk-path*
--
This message was sent by Atlassian Jira
(v8.3.4#803005)