sjmiller609 opened a new issue #9823:
URL: https://github.com/apache/airflow/issues/9823
<!--
Welcome to Apache Airflow! For a smooth issue process, try to answer the
following questions.
Don't worry if they're not all applicable; just try to include what you can
:-)
If you need to include code snippets or logs, please put them in fenced code
blocks. If they're super-long, please use the details tag like
<details><summary>super-long log</summary> lots of stuff </details>
Please delete these comment blocks before submitting the issue.
-->
**Description**
<!-- A short description of your feature -->
I am seeking approval for a minor feature. We are using Airflow metrics, two
of these metrics are airflow.operator_failures_.* and
airflow.operator_successes_.*. The name of these metrics will include the
operator name, for example airflow.operator_successes_PythonOperator.
With regard to time series data, it is best practice to reduce the possible
values for any given attribute in a metric. For example, we are recording a
metric in our TSDB "airflow_operator_success" with one of the attributes as
"operator". For example, here are some sample entries:
```
airflow_operator_failures{operator="ExternalTaskSensor"} 5
airflow_operator_failures{operator="GCSToPostgresOperator"} 10
airflow_operator_failures{operator="PipedriveToCloudStorageOperator"} 2
airflow_operator_failures{operator="PostgresOperator"} 24
airflow_operator_failures{operator="PrometheusToGCSOperator"} 128
airflow_operator_failures{operator="MyTopSecretOperator"} 676
```
We have a concern that since Operators may be named by the airflow user,
that there are 1) too many (infinite) options for the "operator" attribute,
thereby causing performance issues in the TSDB and 2) the metrics system is
exposing potentially sensitive information, for example "MyTopSecretOperator",
which our operations team prefers to omit from our metrics system.
With approval, I may contribute a change that will retain the same behavior
by default, but optionally accept an allow-list in the airflow configuration in
the form of a list of operator names. If provided, this list will be the only
allowed values for operator name in the metrics airflow.operator_successes_.*
and airflow.operator_failures_.*, with operators named in any other way falling
into an "other" category. For example:
```
airflow_operator_failures{operator="BashOperator"} 5
airflow_operator_failures{operator="PythonOperator"} 10
airflow_operator_failures{operator="Other"} 100
```
**Use case / motivation**
<!-- What do you want to happen?
Rather than telling us how you might implement this solution, try to take a
step back and describe what you are trying to achieve.
-->
In our use case, we are persisting metrics data using Prometheus and the
statsd exporter (translates statsd into prometheus metrics language). It is
best practice for prometheus (and any TSDB) to have minimized
[cardinality](https://www.robustperception.io/cardinality-is-key) on all
metrics. We have found performance issues with regard to
airflow.operator_failures_.*. One option is for us to drop all information
about operator name from this metric, but we believe that such a feature in
airflow as proposed here would allow more informative metrics while retaining
reasonable cardinality (for example, include all operators included in Airflow
by default).
In general, this story is one step to making Airflow metrics be more
prometheus-friendly. There are other metrics that could be considered to be
made more prometheus-friendly, but I hope to constrain this issue to only the
two mentioned.
**Related Issues**
<!-- Is there currently another issue associated with this? -->
none
**Alternatives**
It is also possible for this to be accomplished in the statsd-exporter
project.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]