[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297388#comment-15297388
 ] 

Chris Nauroth commented on ZOOKEEPER-2366:
------------------------------------------

[~shralex], thank you for your comments.  I missed a lot of subtlety in the 
error handling here when I reviewed.

bq. At that point, the operation is committed for all practical purposes 
(quorum already accepted), there is no way to abort it. What can we do without 
redesigning ZK ?

Thinking out loud, so I'm probably missing something, but can we do something 
like bind to the new port during the proposal, but still keep the old port 
listening too?  Then, during commit, we'd transition the accept thread to the 
new already-bound port and close the old one.  The effect I'm trying to achieve 
is moving the bind failure to proposal time, so it won't get ack'd as 
successful, and therefore the quorum won't accept it.

Of course, this is a much heavier change and maybe strays towards "redesigning 
ZK".

bq. so either we log and proceed or we exit the server.

Thinking about the operational model, I could see "exit the server" easily 
escalating to "exit the ensemble."  If the admin accidentally reconfigs to a 
commonly bound port, such as 8080 for an HTTP server, then there is a strong 
possibility that all servers in the ensemble would hit the bind failure and 
exit.

> Reconfiguration of client port causes a socket leak
> ---------------------------------------------------
>
>                 Key: ZOOKEEPER-2366
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2366
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.5.0
>            Reporter: Timothy Ward
>            Assignee: Flavio Junqueira
>            Priority: Blocker
>             Fix For: 3.5.2
>
>         Attachments: ZOOKEEPER-2366.patch, ZOOKEEPER-2366.patch, 
> zookeeper.patch
>
>
> The NIOServerCnxnFactory reconfigure method can leak server sockets, and 
> hence make ports unusable until the JVM restarts:
> The first line of the method takes a reference to the current 
> ServerSocketChannel and then the next line replaces it. The subsequent 
> interactions with the server socket can fail (for example if the 
> reconfiguration tries to bind to an in-use port). If they fail *before* the  
> call to oldSS.close() then oldSS is *never* closed. This holds that port open 
> forever, and prevents the user from rolling back to the previous port!
> The code from reconfigure is shown below:
>  ServerSocketChannel oldSS = ss;        
>         try {
>            this.ss = ServerSocketChannel.open();
>            ss.socket().setReuseAddress(true);
>            LOG.info("binding to port " + addr);
>            ss.socket().bind(addr);
>            ss.configureBlocking(false);
>            acceptThread.setReconfiguring();
>            oldSS.close();           
>            acceptThread.wakeupSelector();
>            try {
>                         acceptThread.join();
>                  } catch (InterruptedException e) {
>                          LOG.error("Error joining old acceptThread when 
> reconfiguring client port " + e.getMessage());
>                  }
>            acceptThread = new AcceptThread(ss, addr, selectorThreads);
>            acceptThread.start();
>         } catch(IOException e) {
>            LOG.error("Error reconfiguring client port to " + addr + " " + 
> e.getMessage());
>         }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to