Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/2365
  
    Actually I have dug deeper, and I think we want to do this differently, 
although it will be just as simple of a change.
    
    Currently a ConnectException is thrown deep down that end up being wrapped 
and bubbles up to be a RuntimeException which you are catching and simply not 
adding that client to the list of clients.  The issue I am concerned about is 
what happens if the node is temporarily down and will come back up.  Before 
this would cause the worker to restart until it was back up (not ideal but 
would work).  With this patch we will end up never having the spout connect to 
that drpc server.
    
    I think the proper fix is actually here
    
    
https://github.com/liu-zhaokun/storm/blob/d92f1a9c8d7442d4959fec57813fc5de42b179a9/storm-client/src/jvm/org/apache/storm/drpc/DRPCInvocationsClient.java#L44
    
    If instead of letting the Exception bubble up and be handled by the Spout 
we instead will catch/log it and set the AtomicReference to null.  This will 
allow the spout to do its reconnect logic periodically to try and recover, but 
will still keep the spout up and running if a node is completely removed.
    
    Longer term it might be interesting to try and replace the config with a 
location in zookeeper where the DRPC servers can register, so we can add/remove 
DRPC servers as needed and the system automatically adjusts.


---

Reply via email to