htpawel commented on PR #37936: URL: https://github.com/apache/airflow/pull/37936#issuecomment-2024692435
Ok, I did further investigation and now I know everything :D Statsd by convention **should always** emit milliseconds. (See `ms` in their code: `self._send_stat(stat, '%0.6f|ms' % delta, rate)`). And that is why they are requesting time delta object or value in milliseconds: (in their code: `delta can be either a number of milliseconds or a timedelta.`). But later you need to do something with those metrics, so we (and probably most of the people) use statsd-exporter from prometheus. And in statsd-exporter documentation you can read `Timers will be accepted with the ms statsd type. Statsd timer data is transmitted in milliseconds, while Prometheus expects the unit to be seconds. The exporter converts all timer observations to seconds.`. So that is why I thought it is emitted in seconds, I was just checking statsd-exporter /metrics endpoint. So it turns out that not this, but other metrics are wrong in documentation, eg. dagrun.schedule_delay.<dag_id> as it passes time delta to Stats.timing so emits milliseconds but airflow documentation say s those are seconds. And above two from this PR are indeed emitted in seconds, but obviously it is not correct either as those should be milliseconds and are emitted with `ms` suffix. So there are more things f*cked up generally. Timing/Timer should always receive milliseconds value or time delta object and it emits milliseconds, which can be later consumed by statsd-exporter and exporter in seconds for prometheus to scrape them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
