[
https://issues.apache.org/jira/browse/STORM-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333620#comment-15333620
]
Nico Meyer commented on STORM-1394:
-----------------------------------
And yes, it probably was fixed in STORM-1609. But I would have prefered to
register a handler for the closeFuture od the newly created channel since it
elimininates the dead time until the next check, which can be up to 30 seconds.
{code}
@@ -487,6 +487,21 @@ public class Client extends ConnectionWithStatus
implements IStatefulObject {
if (future.isSuccess() &&
connectionEstablished(newChannel)) {
connectionAttempts.set(0);
+ newChannel.getCloseFuture().addListener(
+ new ChannelFutureListener() {
+ @Override
+ public void
operationComplete(ChannelFuture future) throws Exception {
+ Throwable cause = future.getCause();
+ String causeStr = "Closed by peer";
+
+ if(cause != null) {
+ causeStr = cause.toString();
+ }
+
+ LOG.warn("Connection to {},
unexpectedly closed. Cause: {}", address.toString(), causeStr);
+
closeChannelAndReconnect(future.getChannel());
+ }});
+
boolean setChannel =
channelRef.compareAndSet(null, newChannel);
checkState(setChannel);
LOG.debug("successfully connected to {}, {}
[attempt {}]", address.toString(), newChannel.toString(),
{code}
> Netty Client never continue reconnection when worker started a moment ago.
> --------------------------------------------------------------------------
>
> Key: STORM-1394
> URL: https://issues.apache.org/jira/browse/STORM-1394
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Reporter: Jingsong Lee
>
> Worker will wait until all connections are ok.
> This is a situation lead to worker hang forever.
> 1.worker A and B started.
> 2.worker A wait all connections are ok.
> 3.worker A connect to B.
> 4.B dead when A is not active(other connections are not ok).
> 5.B is launched by B' supervisor again (Assignment is not changed).
> 6.A hang forever because there is nobody to reconnect the client of A to B.
> We can fix this problem by 2 method.
> 1.Add closeChannelAndReconnect in Client' status
> Or
> 2.Add closeChannelAndReconnect in StormClientHandler' exceptionCaught
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)