Github user HeartSaVioR commented on the issue:

    https://github.com/apache/storm/pull/2433
  
    @danny0405 
    OK. I have been switching the context from multiple projects/issues recent 
days so may miss the detail on this patch, and even confusion with current 
implementation (sad).
    
    Could you verify if I understand your statements correctly?
    
    1. Older Worker is already leaving heartbeats to local state and newer 
Supervisor can leverage them to report to newer Nimbus hence no need to do 
additional work on that.
    2. Older worker can't report its heartbeat to newer Nimbus directly, hence 
newer Nimbus can't get older workers' heartbeat if newer supervisor is down. <= 
This would be a major difference between older worker and newer worker for this 
patch.
    
    If my understanding is right, looking into ZK for fail-back mechanism (for 
workers which relevant supervisor is down) might still make sense for old 
workers, which work would be not easy. 
    If it is a hard requirement, let's not be smart for old workers. If we can 
identify topology version is under 2.0.0, just ignore heartbeats supervisor is 
reporting and read heartbeats from ZK. This will get rid of headache between 
aggregation between supervisor RPC and ZK.


---

Reply via email to