[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15295575#comment-15295575
 ] 

Flavio Junqueira commented on ZOOKEEPER-2366:
---------------------------------------------

That's good insight.

bq. At that point, the operation is committed for all practical purposes 
(quorum already accepted), there is no way to abort it. What can we do without 
redesigning ZK ?

I'd say that this part is fine, I'm ok with letting it commit. I'm less ok with 
the server that is supposed to change the port failing to do it and keep 
operating regularly. For example, if we are changing the port of a follower and 
the port change fails, then I'd say that the follower should stop following, 
and a bit more drastically perhaps even exit.

bq. The alternative is perhaps to bring down that server because its state 
can't reflect a committed operation, but I think its an overkill.

I actually think this is a problem. If the user changes something on the 
server, then the user expects to either see the change reflected or an 
immediate indication of an error. Granted that we are printing an error 
message, but I think we can do better than that to call the admin attention to 
the problem right away.

bq. the test scenario is trying to change a follower's port, so processReconfig 
fails to change it at the follower, the leader will execute processReconfig 
just fine, since all it does is update the config in memory. Returning false in 
Leader.java is wrong. My comments can probably be better there, but false has a 
different meaning - it indicates that there are not enough acks. Operations do 
not fail at the commit stage.

Got it, and I don't claim that bit is correct, I'll have a closer look at the 
Leader code.

bq. The test can be improved a bit to also try to reconfig the leader's client 
port and see what happens when its taken. Right now it only checks for 
follower. Its possible to do that without adding too much code if you do the 
flow in a loop - once for leader and once for follower. I think its also ok 
without this extension though.

Ok.
 

> Reconfiguration of client port causes a socket leak
> ---------------------------------------------------
>
>                 Key: ZOOKEEPER-2366
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2366
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.5.0
>            Reporter: Timothy Ward
>            Assignee: Flavio Junqueira
>            Priority: Blocker
>             Fix For: 3.5.2
>
>         Attachments: ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, 
> zookeeper.patch
>
>
> The NIOServerCnxnFactory reconfigure method can leak server sockets, and 
> hence make ports unusable until the JVM restarts:
> The first line of the method takes a reference to the current 
> ServerSocketChannel and then the next line replaces it. The subsequent 
> interactions with the server socket can fail (for example if the 
> reconfiguration tries to bind to an in-use port). If they fail *before* the  
> call to oldSS.close() then oldSS is *never* closed. This holds that port open 
> forever, and prevents the user from rolling back to the previous port!
> The code from reconfigure is shown below:
>  ServerSocketChannel oldSS = ss;        
>         try {
>            this.ss = ServerSocketChannel.open();
>            ss.socket().setReuseAddress(true);
>            LOG.info("binding to port " + addr);
>            ss.socket().bind(addr);
>            ss.configureBlocking(false);
>            acceptThread.setReconfiguring();
>            oldSS.close();           
>            acceptThread.wakeupSelector();
>            try {
>                         acceptThread.join();
>                  } catch (InterruptedException e) {
>                          LOG.error("Error joining old acceptThread when 
> reconfiguring client port " + e.getMessage());
>                  }
>            acceptThread = new AcceptThread(ss, addr, selectorThreads);
>            acceptThread.start();
>         } catch(IOException e) {
>            LOG.error("Error reconfiguring client port to " + addr + " " + 
> e.getMessage());
>         }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to