[ https://issues.apache.org/jira/browse/AMBARI-16946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim updated AMBARI-16946: ---------------------------------- Attachment: AMBARI-16946.patch Reattaching patch since it's not the same format other issues have. > Storm Metrics Sink has high chance to discard some datapoints > ------------------------------------------------------------- > > Key: AMBARI-16946 > URL: https://issues.apache.org/jira/browse/AMBARI-16946 > Project: Ambari > Issue Type: Bug > Components: ambari-metrics > Reporter: Jungtaek Lim > Attachments: AMBARI-16946.patch > > > There's a mismatch between TimelineMetricsCache and Storm metrics unit, while > TimelineMetricsCache considers "metric name + timestamp" to be unique but > Storm is not. > For example, assume that bolt B has task T1, T2 and B has registered metrics > M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same > timestamp TS1 (in TaskInfo, not current time), and received later will be > discarded from TimelineMetricsCache. > If we want to have unique metric point of Storm, we should use "topology name > + component name + task id + metric name" to metric name so that "metric name > + timestamp" will be unique. > There're other issues I would like to address, too. > - Currently, hostname is written to hostname of the machine which runs > metrics sink. Since TaskInfo has hostname of the machine which runs task, > we're better to use this. > - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this > as millisecond, resulting in timestamp flaw, and malfunction of cache > eviction. It should be multiplied by 1000. > - 'component name' is not unique across the cluster, so it's not fit for app > id. 'topology name' is unique so proper value of app id is topology name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)