Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/1674
  
    My example was theoretical, I honestly don't know if in practice nimbus 
would see the supervisor appear then disappear if nimbus's nic was bad.  The 
more common case would be if a ZK nic was bad, that could cause this.  But that 
is beside the point.  There are lots of different things that could make most 
of the cluster look bad or actually go bad for real.  I am fine with 
detecting/handling some cases that we know about and can reproduce, but we 
should have some sort of default catch all.
    
    For example if HDFS loses too many nodes it goes into read only mode, but 
for YARN it ignores it but uses metrics to alert the cluster owners.  If we 
feel that because we are primarily compute, like YARN and want to have a metric 
about how many nodes are blacklisted that seems like a perfectly fine default.  
If we run into other situations and can detect/auto-correct them even better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to