Github user HeartSaVioR commented on the issue:
https://github.com/apache/storm/pull/2433
@danny0405
OK. I have been switching the context from multiple projects/issues recent
days so may miss the detail on this patch, and even confusion with current
implementation (sad).
Could you verify if I understand your statements correctly?
1. Older Worker is already leaving heartbeats to local state and newer
Supervisor can leverage them to report to newer Nimbus hence no need to do
additional work on that.
2. Older worker can't report its heartbeat to newer Nimbus directly, hence
newer Nimbus can't get older workers' heartbeat if newer supervisor is down. <=
This would be a major difference between older worker and newer worker for this
patch.
If my understanding is right, looking into ZK for fail-back mechanism (for
workers which relevant supervisor is down) might still make sense for old
workers, which work would be not easy.
If it is a hard requirement, let's not be smart for old workers. If we can
identify topology version is under 2.0.0, just ignore heartbeats supervisor is
reporting and read heartbeats from ZK. This will get rid of headache between
aggregation between supervisor RPC and ZK.
---