Github user tedxia commented on the pull request:
https://github.com/apache/storm/pull/268#issuecomment-61752674
I test this patch on our product cluster, with five machine, each with 6
workers as max;
The topology based on trident run about 5 hours without fails.
Then I kill one worker called A, then I found the log below on worker
B.Worker B don't exit as worker A died.
```
2014-11-04 17:18:08 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-A/xxx.xxx.xxx.xxx:21812... [47]
2014-11-04 17:18:12 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-A/xxx.xxx.xxx.xxx:21812... [48]
2014-11-04 17:18:16 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-A/xxx.xxx.xxx.xxx:21812... [49]
2014-11-04 17:18:20 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-A/xxx.xxx.xxx.xxx:21812... [50]
2014-11-04 17:18:24 b.s.m.n.Client [INFO] Closing Netty Client
Netty-Client-A/xxx.xxx.xxx.xxx:21812
2014-11-04 17:18:24 b.s.m.n.Client [INFO] Waiting for pending batchs to be
sent with Netty-Client-A/xxx.xxx.xxx.xxx:21812..., timeout: 600000ms, pendings: 0
2014-11-04 17:18:24 b.s.m.n.Client [INFO] Client is being closed, and does
not take requests any more, drop the messages...
2014-11-04 17:18:24 b.s.m.n.Client [INFO] Client is being closed, and does
not take requests any more, drop the messages...
```
As worker A died, nimbus reschedule a new worker F, then worker B connect
to worker F.
```
2014-11-04 17:16:53 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-A/xxx.xxx.xxx.xxx:21812... [21]
2014-11-04 17:16:54 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-F/xxx.xxx.xxx.xxx:21813... [17]
2014-11-04 17:16:54 b.s.m.n.Client [INFO] connection established to a
remote host Netty-Client-F/xxx.xxx.xxx.xxx:21813, [id: 0xbf721a18,
/xxx.xxx.xxx.xxx:63811 => F/xxx.xxx.xxx.xxx:21813]
2014-11-04 17:16:55 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-A/10.2.201.65:21812... [22]
```
worker B connect to worker F successful before worker B close connection
with Worker A.
Because this is our product cluster, I rewrite the hostname and ip in the
log.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---