Github user abhishekagarwal87 commented on the issue:
https://github.com/apache/storm/pull/1595
@HeartSaVioR - Thanks for putting in efforts on improving metrics. The
approach of passing metrics via system bolt looks agreeable to me. Though I
would still prefer "Two types of metric consumers approach".
Secondly regarding aggregation, component level aggregation is much more
preferred. That is what we do in external systems. Simply for the reason that
`total number of messages processed by component A` is more useful and
actionable information than `total number of messages processed on this machine
and port`. If the component information is being let go during aggregation, I
don't think it will be used much. At least I wouldn't in our production system.
Load among tasks is usually homogenous and uniform (law of large numbers)
but same is not true for different components which execute different code
altogether. I think just aggregating per worker+component combination would
also reduce the amount of metrics being emitted and still keep the metrics
useful.
Thirdly, the assumption that a metric being declared with 60 second update
interval will actually come in 60 second intervals is not correct. They can
come with 120 seconds apart or they can also come 5 seconds apart (depending on
how fast or slow the task is).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---