Github user d2r commented on the pull request:

    https://github.com/apache/storm/pull/392#issuecomment-71714975
  
    > Not to derail the discussion but personally, I would much rather not 
store errors in zk at all if its just for rendering the errors in UI.  If the 
spouts/bolts could just store this in memory with some expiration that should 
suffice and we could expose an API at worker layer to get this information 
directly from it. If the host dies you lose some errors but that does not seem 
like a big deal. The only downside will be ui would now have to make requests 
against worker hosts to get erros but that seems ok to me, you would also get 
parallelism as all these worker calls can be made in parallel. I haven't 
thought this through completely and its probably much more work but I would 
love to hear your opinion.
    
    Yeah, we were thinking about distributing things this way too.  We figured 
that the bigger problem is the heartbeats, and if we could get an improvement 
with less effort here, it would be worth it.  It would be a much bigger change 
to distribute the errors out of ZK, yet maybe it is not a bad idea.  (Also, I 
think it is good to persist the errors anyway, not just in memory.  Users would 
like to see errors on the UI even if there was some issue that brought the 
supervisor down—like a rolling upgrade of the cluster.)  Maybe we could file 
a JIRA for better gathering of errors.
    
    This change was intended to be small in scope and just give a way to get 
errors more efficiently when a topology has many, many components.  It was 
prompted by seeing topology page load times of minutes from one of our 
customers.  Plus, this may be less of a problem once heartbeats (and their 
metrics) are no longer getting sent around, but still it may not a bad idea to 
use a more distributed model like you suggest.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to