Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/1642
I found another issue:
When I rebalance 3 workers into 1 worker, all workers are killed first
(expected) and AsyncLocalizer clear out topology codes since all workers are
killed.
But Supervisor doesn't download topology code again while starting new
worker, so it goes wrong and worker and/or supervisor are killed.
This seems to be a kind of race condition (Slot and AsyncLocalizer) and I
saw two scenarios:
1. Worker can be launched but topology directory is removed after launching
so worker is crashed. Slot tries to relaunch worker but throws
IllegalStateException because topology directory is gone and supervisor also be
killed.
2. Supervisor is killed even before launching worker.
After this, Supervisor will consistently be killed unless clearing out
supervisor directory as same as above comment.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---