Github user HeartSaVioR commented on the pull request:
https://github.com/apache/storm/pull/795#issuecomment-147896620
@kishorvpatil
For confirming that I understood this issue well, I'd like you to elaborate
this issue some more.
As far as I understand, when retry count exceeds, Connect.run() throws
RuntimeException, but worker doesn't be killed since it is a TimerTask.
So it just closes the connection and wait for reassign for such worker.
If Nimbus reassigns dead worker to another after retry limit exceed,
another connection is being made and it would be fine.
But some reason if problematic worker is just not able to connect (for
example, STW, and so on) to another workers for longer than connection retrying
but not forever, and nimbus doesn't reassign problematic worker, another
workers cannot connect to problematic worker forever.
Is my assumption right? Or there's other reason?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---