kimyen commented on issue #17920:
URL: https://github.com/apache/airflow/issues/17920#issuecomment-909729347


   @jedcunningham , thank you for responding to me. There is definitely a gap 
in my knowledge here about what is needed to export data to datadog. I have 
some follow-up questions:
   - With the config above, `statsd_host: localhost` means that there is a 
dogstatsd agent running in the same pod as the scheduler and each worker? We 
have DAGs that emits metric to datadog, and we set env for which hostIP to use 
to reach the datadog agent. 
   - If there is **not** a dogstatsd agent running in the same pod as the 
scheduler and each worker pod, then we need one datadog agent running somewhere 
that we can set `DOGSTATSD_HOST`, `DD_AGENT_HOST` and 
`AIRFLOW__SCHEDULER__STATSD_HOST`. We used to use the datadog helm chart with 
Airflow here: https://github.com/DataDog/helm-charts/tree/main/charts/datadog, 
but this creates a datadog agent pod for each Airflow pod. We prefer to have a 
single pod for dogstatsd agent.
   - With the config above, how do I specify statsd mappings? With using the 
datadog helm chart, the mapping would go in `datadog-values.yaml`. Our mappings 
are to group many Airflow metrics into a generic metric, and turn the variables 
on the metric name to tags since large number of metric names increase datadog 
bill. For example, this one reduces 300+ (number of DAGs we have) metrics down 
to 1:
   ```
         - match: 'airflow\.dag_processing\.last_duration\.(.*)'
           match_type: "regex"
           name: "airflow.dag_processing.last_duration"
           tags:
             dag_file: "$1"
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to