kimyen commented on issue #17920: URL: https://github.com/apache/airflow/issues/17920#issuecomment-909729347
@jedcunningham , thank you for responding to me. There is definitely a gap in my knowledge here about what is needed to export data to datadog. I have some follow-up questions: - With the config above, `statsd_host: localhost` means that there is a dogstatsd agent running in the same pod as the scheduler and each worker? We have DAGs that emits metric to datadog, and we set env for which hostIP to use to reach the datadog agent. - If there is **not** a dogstatsd agent running in the same pod as the scheduler and each worker pod, then we need one datadog agent running somewhere that we can set `DOGSTATSD_HOST`, `DD_AGENT_HOST` and `AIRFLOW__SCHEDULER__STATSD_HOST`. We used to use the datadog helm chart with Airflow here: https://github.com/DataDog/helm-charts/tree/main/charts/datadog, but this creates a datadog agent pod for each Airflow pod. We prefer to have a single pod for dogstatsd agent. - With the config above, how do I specify statsd mappings? With using the datadog helm chart, the mapping would go in `datadog-values.yaml`. Our mappings are to group many Airflow metrics into a generic metric, and turn the variables on the metric name to tags since large number of metric names increase datadog bill. For example, this one reduces 300+ (number of DAGs we have) metrics down to 1: ``` - match: 'airflow\.dag_processing\.last_duration\.(.*)' match_type: "regex" name: "airflow.dag_processing.last_duration" tags: dag_file: "$1" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
