[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298372#comment-15298372
 ] 

Chris Nauroth commented on ZOOKEEPER-2366:
------------------------------------------

bq. I just want to point out that there is no voting in Paxos or ZooKeeper.

Absolutely.  As I think more about trying to push the bind into the proposal 
phase, it just opens up more failure modes with no clear solutions.  As you 
said, this is exactly what multi-phase commit addresses, but then we've got a 
very different system on our hands.

I'm slowly coming around to the opinion that we should follow your earlier 
suggestion to leave the error handling as is and simply ensure the old socket 
gets closed to prevent the leak.  This would allow the admin to recover by 
reconfiguring back to the old port (or a different new port).  It's unusual in 
that the server can't reflect committed state, but the alternative of halting 
increases the severity of the problem.  I don't see a cleaner alternative that 
wouldn't cause a huge impact on overall ZooKeeper architecture.

[~fpj], what are your current thoughts?

> Reconfiguration of client port causes a socket leak
> ---------------------------------------------------
>
>                 Key: ZOOKEEPER-2366
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2366
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.5.0
>            Reporter: Timothy Ward
>            Assignee: Flavio Junqueira
>            Priority: Blocker
>             Fix For: 3.5.2
>
>         Attachments: ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, 
> zookeeper.patch
>
>
> The NIOServerCnxnFactory reconfigure method can leak server sockets, and 
> hence make ports unusable until the JVM restarts:
> The first line of the method takes a reference to the current 
> ServerSocketChannel and then the next line replaces it. The subsequent 
> interactions with the server socket can fail (for example if the 
> reconfiguration tries to bind to an in-use port). If they fail *before* the  
> call to oldSS.close() then oldSS is *never* closed. This holds that port open 
> forever, and prevents the user from rolling back to the previous port!
> The code from reconfigure is shown below:
>  ServerSocketChannel oldSS = ss;        
>         try {
>            this.ss = ServerSocketChannel.open();
>            ss.socket().setReuseAddress(true);
>            LOG.info("binding to port " + addr);
>            ss.socket().bind(addr);
>            ss.configureBlocking(false);
>            acceptThread.setReconfiguring();
>            oldSS.close();           
>            acceptThread.wakeupSelector();
>            try {
>                         acceptThread.join();
>                  } catch (InterruptedException e) {
>                          LOG.error("Error joining old acceptThread when 
> reconfiguring client port " + e.getMessage());
>                  }
>            acceptThread = new AcceptThread(ss, addr, selectorThreads);
>            acceptThread.start();
>         } catch(IOException e) {
>            LOG.error("Error reconfiguring client port to " + addr + " " + 
> e.getMessage());
>         }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to