[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065005#comment-14065005
 ] 

Flavio Junqueira commented on ZOOKEEPER-1807:
---------------------------------------------

I checked QCM and it indeed only keeps trying to connect while it is executing 
lookForLeader, so your observation is right. The approach you propose seems 
right to me.

I also checked the test failure and the test that failed is 
NioNettySuiteHammerTest. It timed out, so it is hard to say if it is related to 
this patch or not, although I've seen this same test failure with another patch 
recently. I'm not sure what's causing it, so we might want to consider 
committing this patch and trying to figure out the hammer test problem 
separately.

What do you think? 

> Observers spam each other creating connections to the election addr
> -------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1807
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1807
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: Raul Gutierrez Segales
>            Assignee: Alexander Shraer
>            Priority: Blocker
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1807-alex.patch, ZOOKEEPER-1807-ver2.patch, 
> ZOOKEEPER-1807-ver3.patch, ZOOKEEPER-1807-ver4.patch, 
> ZOOKEEPER-1807-ver5.patch, ZOOKEEPER-1807-ver6.patch, ZOOKEEPER-1807.patch, 
> notifications-loop.png
>
>
> Hey [~shralex],
> I noticed today that my Observers are spamming each other trying to open 
> connections to the election port. I've got tons of these:
> {noformat}
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
> connection already for server 9
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
> connection already for server 10
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
> connection already for server 6
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
> connection already for server 12
> 2013-11-01 22:19:45,819 - DEBUG [WorkerSender[myid=13]] - There is a 
> connection already for server 14
> {noformat}
> and so and so on ad nauseam. 
> Now, looking around I found this inside FastLeaderElection.java from when you 
> committed ZOOKEEPER-107:
> {noformat}
>      private void sendNotifications() {
> -        for (QuorumServer server : self.getVotingView().values()) {
> -            long sid = server.id;
> -
> +        for (long sid : self.getAllKnownServerIds()) {
> +            QuorumVerifier qv = self.getQuorumVerifier();
> {noformat}
> Is that really desired? I suspect that is what's causing Observers to try to 
> connect to each other (as opposed as just connecting to participants). I'll 
> give it a try now and let you know. (Also, we use observer ids that are > 0, 
> and I saw some parts of the code that might not deal with that assumption - 
> so it could be that too..). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to