Github user kevinconaway commented on the pull request:
https://github.com/apache/storm/pull/639#issuecomment-219860409
Actually it looks like this issue
>There is another more serious issue that lead to huge problems in one of
out topologies whenever a worker crashed due to some exception.
If worker A sucessfully connects to worker B for the first time during
startup but worker B closes the connection for some reason before the
:worker-active-flag is set to true (here
https://github.com/apache/storm/blob/v0.9.6/storm-core/src/clj/backtype/storm/daemon/worker.clj#L356),
there will be no further reconnect attempts, since no messages will be
processed and neither send() nor flushMessages() will ever be called.
may be fixed by STORM-1609 with the addition of the client keepalive
TimerTask
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---