FYI the patch has been merged to master e to 3.5 branch thank you ! https://issues.apache.org/jira/browse/ZOOKEEPER-3066
Enrico Il giorno gio 21 giu 2018 alle ore 17:38 Enrico Olivelli < eolive...@gmail.com> ha scritto: > > > Il gio 21 giu 2018, 17:13 Andor Molnar <an...@cloudera.com.invalid> ha > scritto: > >> Okay, let's put it this way: what kind of test would you like to add here? >> >> *Unit test* >> This is a requirement for the patch to be committed. You don't need to >> fire >> up a full blown ensemble to test the new functionality. Pick the class >> that >> you'd like to test, mock out the dependencies with moquito and write some >> quick unit tests. Your tests should finish in a few hundreds millisecs >> (1-2 >> secs maximum). >> > > This is achieved in my patch. There was already a test about JMX and > LocalPeerMXBean. I have enhanced it, by extending the test to cover my new > JMX attribute and, to cover even RemotePeerMXBean, which was not tested at > all. > > One review comment asked me to perform assertions after a leadership > change. > > I did not touch ZK internals with this patch, I am only publishing > internal state to JMX endpoint. > > Maybe what I am trying to do is not needed for the patch to be > 'acceptable'. > > I think it would be useful to add such test case, but not strictly needed. > Unfortunately current test utilities seem to able to do what requested. > > I am able to add mockito based tests if requested. > > > > > >> *Integration test* >> This is not required, but more than welcome. Can you do it with reasonable >> amount of effort? Is it going to finish in an acceptable time frame (5-10 >> secs)? If yes, go ahead, if not, skip that. >> > > Not needed. Maybe existing test case is already similar to an integration > test ( it brings up a cluster of 5 nodes). > > Anyway my changes is not slowing down the full suite add I am only adding > an assertion to an existing old test. > > I think the restart of the cluster it is not strictly needed and it will > may add a new flaky test. > > Treating ZK as one of my projects, I think the existing JMX test corpus > should be refactored but given the little impact of this patch is it not > worth to spend much time. > > > > Enrico > > > > > >> Hope that helps. >> >> Regards, >> Andor >> >> >> >> >> >> >> On Thu, Jun 21, 2018 at 2:39 PM, Enrico Olivelli <eolive...@gmail.com> >> wrote: >> >> > Andor, >> > >> > I had already tried that way but without success. >> > I have updated the patch with a proposal. >> > Thanks >> > >> > Enrico >> > >> > Il gio 21 giu 2018, 13:20 Andor Molnar <an...@cloudera.com.invalid> ha >> > scritto: >> > >> > > Hi Enrico! >> > > >> > > Take a look at testElectionFraud(), it might be useful to you. >> > > >> > > Regards, >> > > Andor >> > > >> > > >> > > >> > > On Thu, Jun 21, 2018 at 12:00 PM, Enrico Olivelli < >> eolive...@gmail.com> >> > > wrote: >> > > >> > > > Norbert, >> > > > thank you for taking a look >> > > > >> > > > Il giorno gio 21 giu 2018 alle ore 11:58 Norbert Kalmar >> > > > <nkal...@cloudera.com.invalid> ha scritto: >> > > > >> > > > > Good question. Wouldn't just killing the leader do the job? >> > > > > >> > > > >> > > > It seems to me that is it not "immediate" to "kill" a QuorumPeer >> inside >> > > > such test. >> > > > If you "shutdown" it you cannot 'start' it anymore. >> > > > >> > > > This is why I am asking for help. >> > > > >> > > > -- Enrico >> > > > >> > > > >> > > > >> > > > > >> > > > > Regards, >> > > > > Norbert >> > > > > >> > > > > On Thu, Jun 21, 2018 at 3:51 AM Enrico Olivelli < >> eolive...@gmail.com >> > > >> > > > > wrote: >> > > > > >> > > > > > This is my patch >> > > > > > https://github.com/apache/zookeeper/pull/546 >> > > > > > >> > > > > > One question: >> > > > > > I would like to force the cluster to change leader in >> > > > > > HierarchicalQuorumTest, so that I can test that JMX will reflect >> > the >> > > > new >> > > > > > status of the group. >> > > > > > Any idea about how to bounce the leader ? >> > > > > > >> > > > > > Cheers >> > > > > > Enrico >> > > > > > >> > > > > > Il giorno mer 20 giu 2018 alle ore 13:45 Enrico Olivelli < >> > > > > > eolive...@gmail.com> ha scritto: >> > > > > > >> > > > > > > This is my JIRA >> > > > > > > I am going to work on a patch >> > > > > > > >> > > > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-3066 >> > > > > > > >> > > > > > > Enrico >> > > > > > > >> > > > > > > Il gio 10 mag 2018, 19:47 Andor Molnar <an...@cloudera.com> >> ha >> > > > > scritto: >> > > > > > > >> > > > > > >> "in order to guess which is the leader I have to ask to all >> of >> > the >> > > > > three >> > > > > > >> nodes in the cluster" >> > > > > > >> >> > > > > > >> That's correct. >> > > > > > >> >> > > > > > >> Regards, >> > > > > > >> Andor >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> On Thu, May 10, 2018 at 4:07 AM, Enrico Olivelli < >> > > > eolive...@gmail.com >> > > > > > >> > > > > > >> wrote: >> > > > > > >> >> > > > > > >> > Il giorno mer 9 mag 2018 alle ore 20:24 Patrick Hunt < >> > > > > > ph...@apache.org> >> > > > > > >> ha >> > > > > > >> > scritto: >> > > > > > >> > >> > > > > > >> > > iiuc what you are interested in the information is >> already >> > > > > > available. >> > > > > > >> The >> > > > > > >> > > beans have a "state" attribute which indicates following >> vs >> > > > > leading. >> > > > > > >> > > >> > > > > > >> > > Try attaching a jconsole to the running servers, use the >> > > > "mbeans" >> > > > > > tab >> > > > > > >> and >> > > > > > >> > > open org.apache.ZooKeeperService -> replicatedserver -> >> > > replica >> > > > -> >> > > > > > >> > > attributes, you'll see the "state" attribute there. >> > > > > > >> > > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > Patrick, >> > > > > > >> > I can't find this information. >> > > > > > >> > If I log into a "follower" I get this info only for the >> > 'current >> > > > > > >> replica' >> > > > > > >> > >> > > > > > >> > Example, I have three peers, the first one is a >> "Follower", >> > on >> > > > JMX >> > > > > I >> > > > > > >> have >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_1 - >> > > > > > ClientAddress >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_1 - >> > > > > > >> ElectionAddress >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_1 - >> > > > > LearnerType >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_1 - >> > Name >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_1 - >> > > > > > QuorumAddress >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_1 - >> > > State = >> > > > > > >> > following >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_1 - >> > > > > > ConfigVersion >> > > > > > >> > ..... >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > for other peers I see only >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_2 - >> > > > > > ClientAddress >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_2 - >> > > > > > >> ElectionAddress >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_2 - >> > > > > LearnerType >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_2 - >> > Name >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_2 - >> > > > > > QuorumAddress >> > > > > > >> > >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_3 - >> > > > > > ClientAddress >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_3 - >> > > > > > >> ElectionAddress >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_3 - >> > > > > LearnerType >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_3 - >> > Name >> > > > > > >> > o.a.ZookKeeperService - ReplicatedServer_id1 - replica_3 - >> > > > > > QuorumAddress >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > so quering only this server I cannot guess which is the >> > current >> > > > > > leader, >> > > > > > >> the >> > > > > > >> > only information I can extract is: >> > > > > > >> > - I am a follower >> > > > > > >> > - We are a cluster of three >> > > > > > >> > - Every of the three is a PARTECIPANT (no observers) >> > > > > > >> > >> > > > > > >> > in order to guess which is the leader I have to ask to all >> of >> > > the >> > > > > > three >> > > > > > >> > nodes in the cluster >> > > > > > >> > >> > > > > > >> > Am I missing something ? I am running 3.5.3-BETA >> > > > > > >> > >> > > > > > >> > Enrico >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > > >> > > > > > >> > > Patrick >> > > > > > >> > > >> > > > > > >> > > On Wed, May 9, 2018 at 8:02 AM, Enrico Olivelli < >> > > > > > eolive...@gmail.com> >> > > > > > >> > > wrote: >> > > > > > >> > > >> > > > > > >> > > > Thank you Edward >> > > > > > >> > > > >> > > > > > >> > > > I will pack all together and send out a patch as soon >> as I >> > > > have >> > > > > > >> time. >> > > > > > >> > > > I am running 3.5 in production and given than an RC for >> > > 3.5.4 >> > > > is >> > > > > > >> going >> > > > > > >> > to >> > > > > > >> > > > be cut soon I will have to wait for 3.5.5 and I assume >> it >> > > > won't >> > > > > be >> > > > > > >> > > > immediate. >> > > > > > >> > > > >> > > > > > >> > > > Cheers >> > > > > > >> > > > Enrico >> > > > > > >> > > > >> > > > > > >> > > > Il giorno mer 9 mag 2018 alle ore 14:37 Edward Ribeiro >> < >> > > > > > >> > > > edward.ribe...@gmail.com> ha scritto: >> > > > > > >> > > > >> > > > > > >> > > > > Sent before finishing the previous email. Only to >> > > > complement, >> > > > > > the >> > > > > > >> > > > > findLeader() could have been as below, but this >> change >> > is >> > > > > only a >> > > > > > >> > nitty >> > > > > > >> > > > > detail and totally irrelevant to the questions you >> are >> > > > asking. >> > > > > > :) >> > > > > > >> > > > > >> > > > > > >> > > > > /** >> > > > > > >> > > > > * Returns the address of the node we think is the >> > leader. >> > > > > > >> > > > > */ >> > > > > > >> > > > > protected QuorumServer findLeader() { >> > > > > > >> > > > > >> > > > > > >> > > > > // Find the leader by id >> > > > > > >> > > > > long currentLeader = >> self.getCurrentVote().getId(); >> > > > > > >> > > > > >> > > > > > >> > > > > QuorumServer leaderServer = >> > > > > > self.getView().get(currentLeader); >> > > > > > >> > > > > >> > > > > > >> > > > > if (leaderServer == null) { >> > > > > > >> > > > > LOG.warn("Couldn't find the leader with id = >> > {}", >> > > > > > >> > > currentLeader); >> > > > > > >> > > > > } >> > > > > > >> > > > > return leaderServer; >> > > > > > >> > > > > } >> > > > > > >> > > > > >> > > > > > >> > > > > Edward >> > > > > > >> > > > > >> > > > > > >> > > > > On Wed, May 9, 2018 at 9:29 AM, Edward Ribeiro < >> > > > > > >> > > edward.ribe...@gmail.com >> > > > > > >> > > > > >> > > > > > >> > > > > wrote: >> > > > > > >> > > > > >> > > > > > >> > > > > > Hi Enrico, >> > > > > > >> > > > > > >> > > > > > >> > > > > > Well, I am not an expert on QuorumPeer either (not >> an >> > > > expert >> > > > > > on >> > > > > > >> > > > anything, >> > > > > > >> > > > > > really), but maybe it's the variable and method >> below? >> > > > > > >> > > > > > >> > > > > > >> > > > > > ----------------- QuorumPeer ------------------ >> > > > > > >> > > > > > >> > > > > > >> > > > > > /** >> > > > > > >> > > > > > * This is who I think the leader currently is. >> > > > > > >> > > > > > */ >> > > > > > >> > > > > > volatile private Vote currentVote; >> > > > > > >> > > > > > >> > > > > > >> > > > > > public synchronized Vote getCurrentVote(){ >> > > > > > >> > > > > > return currentVote; >> > > > > > >> > > > > > } >> > > > > > >> > > > > > >> > > > > > >> > > > > > --------------------------------------- >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > Then it's a matter of calling >> > > > > > >> quorumPeer.getCurrentVote().getId() >> > > > > > >> > and >> > > > > > >> > > > > > quorumPeer.getServerState()? >> > > > > > >> > > > > > >> > > > > > >> > > > > > Btw, the Learner class has this handy method below >> > (self >> > > > is >> > > > > a >> > > > > > >> > > > > QuorumPeer): >> > > > > > >> > > > > > >> > > > > > >> > > > > > ---------------- Learner -------------------- >> > > > > > >> > > > > > >> > > > > > >> > > > > > /** >> > > > > > >> > > > > > * Returns the address of the node we think is the >> > > leader. >> > > > > > >> > > > > > */ >> > > > > > >> > > > > > protected QuorumServer findLeader() { >> > > > > > >> > > > > > QuorumServer leaderServer = null; >> > > > > > >> > > > > > // Find the leader by id >> > > > > > >> > > > > > Vote current = self.getCurrentVote(); >> > > > > > >> > > > > > for (QuorumServer s : self.getView().values()) >> { >> > > > > > >> > > > > > if (s.id == current.getId()) { >> > > > > > >> > > > > > leaderServer = s; >> > > > > > >> > > > > > break; >> > > > > > >> > > > > > } >> > > > > > >> > > > > > } >> > > > > > >> > > > > > if (leaderServer == null) { >> > > > > > >> > > > > > LOG.warn("Couldn't find the leader with id >> = " >> > > > > > >> > > > > > + current.getId()); >> > > > > > >> > > > > > } >> > > > > > >> > > > > > return leaderServer; >> > > > > > >> > > > > > } >> > > > > > >> > > > > > >> > > > > > >> > > > > > --------------------------------------- >> > > > > > >> > > > > > >> > > > > > >> > > > > > By the way, as a side note, the map traversal >> could be >> > > > > changed >> > > > > > >> by: >> > > > > > >> > > > > > >> > > > > > >> > > > > > ---------------------------- >> > > > > > >> > > > > > >> > > > > > >> > > > > > if (self.getView().contains(current.getId()) { >> > > > > > >> > > > > > >> > > > > > >> > > > > > } >> > > > > > >> > > > > > >> > > > > > >> > > > > > --------------------------- >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > You can see above this method the >> quorumPeer.getView() >> > > > > > returns a >> > > > > > >> > > > Map<sid, >> > > > > > >> > > > > > QuorumServer> as below: >> > > > > > >> > > > > > >> > > > > > >> > > > > > -------------QuorumPeer --------- >> > > > > > >> > > > > > >> > > > > > >> > > > > > /** >> > > > > > >> > > > > > * A 'view' is a node's current opinion of the >> > > membership >> > > > of >> > > > > > the >> > > > > > >> > > entire >> > > > > > >> > > > > > * ensemble. >> > > > > > >> > > > > > */ >> > > > > > >> > > > > > public Map<Long,QuorumPeer.QuorumServer> getView() >> { >> > > > > > >> > > > > > return Collections.unmodifiableMap( >> > > > getQuorumVerifier(). >> > > > > > >> > > > > > getAllMembers()); >> > > > > > >> > > > > > } >> > > > > > >> > > > > > >> > > > > > >> > > > > > ----------------------------- >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > And then it retrieves the QuorumServer that has >> many >> > > more >> > > > > > >> > information >> > > > > > >> > > > > > about the node besides the sid (InetSocketAddress, >> > > > hostname, >> > > > > > >> etc). >> > > > > > >> > :) >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > Cheers, >> > > > > > >> > > > > > Edward >> > > > > > >> > > > > > >> > > > > > >> > > > > > On Wed, May 9, 2018 at 8:50 AM, Enrico Olivelli < >> > > > > > >> > eolive...@gmail.com >> > > > > > >> > > > >> > > > > > >> > > > > > wrote: >> > > > > > >> > > > > > >> > > > > > >> > > > > >> So I am trying to create a patch in order to >> expose >> > on >> > > > JMX >> > > > > > the >> > > > > > >> id >> > > > > > >> > of >> > > > > > >> > > > the >> > > > > > >> > > > > >> current "leader" (on the JVM of a follower) >> > > > > > >> > > > > >> >> > > > > > >> > > > > >> I am trying to find in ZK which is the variable >> which >> > > > holds >> > > > > > >> the ID >> > > > > > >> > > of >> > > > > > >> > > > > the >> > > > > > >> > > > > >> current leader. >> > > > > > >> > > > > >> I am new to the internal of QuorumPeer >> > > > > > >> > > > > >> >> > > > > > >> > > > > >> Can someone give me some hint ? >> > > > > > >> > > > > >> >> > > > > > >> > > > > >> Enrico >> > > > > > >> > > > > >> >> > > > > > >> > > > > >> Il giorno mar 8 mag 2018 alle ore 10:08 Ansel >> > > Zandegran < >> > > > > > >> > > > > >> ansel.zandeg...@infor.com> ha scritto: >> > > > > > >> > > > > >> >> > > > > > >> > > > > >> > Hi, >> > > > > > >> > > > > >> > That is possible with 4 letter commands. We are >> > using >> > > > it >> > > > > > >> now. In >> > > > > > >> > > > 3.5.x >> > > > > > >> > > > > >> it >> > > > > > >> > > > > >> > is going to be removed in favour of admin server >> > > > > (embedded >> > > > > > >> web >> > > > > > >> > > > > server). >> > > > > > >> > > > > >> > We are running in an environment where it’s not >> > > > possible >> > > > > to >> > > > > > >> run >> > > > > > >> > > JMX >> > > > > > >> > > > or >> > > > > > >> > > > > >> > embedded web servers. >> > > > > > >> > > > > >> > >> > > > > > >> > > > > >> > So I am wondering if there is another way? It >> would >> > > be >> > > > > nice >> > > > > > >> to >> > > > > > >> > > have >> > > > > > >> > > > > this >> > > > > > >> > > > > >> > info as a znode. >> > > > > > >> > > > > >> > >> > > > > > >> > > > > >> > Best regards, >> > > > > > >> > > > > >> > Ansel >> > > > > > >> > > > > >> > >> > > > > > >> > > > > >> > > On 8 May 2018, at 09:55, Flavio Junqueira < >> > > > > > f...@apache.org> >> > > > > > >> > > wrote: >> > > > > > >> > > > > >> > > >> > > > > > >> > > > > >> > > Hi Enrico, >> > > > > > >> > > > > >> > > >> > > > > > >> > > > > >> > > You can determine the state of a server it via >> > > > 4-letter >> > > > > > >> > > commands. >> > > > > > >> > > > > >> Would >> > > > > > >> > > > > >> > that work for you? >> > > > > > >> > > > > >> > > >> > > > > > >> > > > > >> > > -Flavio >> > > > > > >> > > > > >> > > >> > > > > > >> > > > > >> > >> On 8 May 2018, at 09:09, Enrico Olivelli < >> > > > > > >> > eolive...@gmail.com> >> > > > > > >> > > > > >> wrote: >> > > > > > >> > > > > >> > >> >> > > > > > >> > > > > >> > >> Hi, >> > > > > > >> > > > > >> > >> is there any way to see in JMX which is the >> > leader >> > > > of >> > > > > a >> > > > > > >> > > ZooKeeper >> > > > > > >> > > > > >> > cluster? >> > > > > > >> > > > > >> > >> >> > > > > > >> > > > > >> > >> My problem is: given access to any of the >> nodes >> > of >> > > > the >> > > > > > >> > cluster >> > > > > > >> > > I >> > > > > > >> > > > > >> want to >> > > > > > >> > > > > >> > >> know from JMX which is the current leader. >> > > > > > >> > > > > >> > >> It seems to me that this information is not >> > > > available, >> > > > > > you >> > > > > > >> > can >> > > > > > >> > > > know >> > > > > > >> > > > > >> > only if >> > > > > > >> > > > > >> > >> the local node is Leader or Follower. >> > > > > > >> > > > > >> > >> >> > > > > > >> > > > > >> > >> Cheers >> > > > > > >> > > > > >> > >> Enrico >> > > > > > >> > > > > >> > > >> > > > > > >> > > > > >> > >> > > > > > >> > > > > >> > >> > > > > > >> > > > > >> >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > >> > > > >> > > > > > >> > > >> > > > > > >> > >> > > > > > >> >> > > > > > > -- >> > > > > > > >> > > > > > > >> > > > > > > -- Enrico Olivelli >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > -- >> > >> > >> > -- Enrico Olivelli >> > >> > -- > > > -- Enrico Olivelli >