sjmiller609 opened a new issue #9823:
URL: https://github.com/apache/airflow/issues/9823


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the 
following questions.
   Don't worry if they're not all applicable; just try to include what you can 
:-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   **Description**
   
   <!-- A short description of your feature -->
   
   I am seeking approval for a minor feature. We are using Airflow metrics, two 
of these metrics are airflow.operator_failures_.* and 
airflow.operator_successes_.*. The name of these metrics will include the 
operator name, for example airflow.operator_successes_PythonOperator.
   
   With regard to time series data, it is best practice to reduce the possible 
values for any given attribute in a metric. For example, we are recording a 
metric in our TSDB "airflow_operator_success" with one of the attributes as 
"operator". For example, here are some sample entries:
   
   ```
   airflow_operator_failures{operator="ExternalTaskSensor"} 5
   airflow_operator_failures{operator="GCSToPostgresOperator"} 10
   airflow_operator_failures{operator="PipedriveToCloudStorageOperator"} 2
   airflow_operator_failures{operator="PostgresOperator"} 24
   airflow_operator_failures{operator="PrometheusToGCSOperator"} 128
   airflow_operator_failures{operator="MyTopSecretOperator"} 676
   ```
   
   We have a concern that since Operators may be named by the airflow user, 
that there are 1) too many (infinite) options for the "operator" attribute, 
thereby causing performance issues in the TSDB and 2) the metrics system is 
exposing potentially sensitive information, for example "MyTopSecretOperator", 
which our operations team prefers to omit from our metrics system.
   
   With approval, I may contribute a change that will retain the same behavior 
by default, but optionally accept an allow-list in the airflow configuration in 
the form of a list of operator names. If provided, this list will be the only 
allowed values for operator name in the metrics airflow.operator_successes_.* 
and airflow.operator_failures_.*, with operators named in any other way falling 
into an "other" category. For example:
   ```
   airflow_operator_failures{operator="BashOperator"} 5
   airflow_operator_failures{operator="PythonOperator"} 10
   airflow_operator_failures{operator="Other"} 100
   ```
   
   **Use case / motivation**
   
   <!-- What do you want to happen?
   
   Rather than telling us how you might implement this solution, try to take a
   step back and describe what you are trying to achieve.
   
   -->
   
   In our use case, we are persisting metrics data using Prometheus and the 
statsd exporter (translates statsd into prometheus metrics language). It is 
best practice for prometheus (and any TSDB) to have minimized 
[cardinality](https://www.robustperception.io/cardinality-is-key) on all 
metrics. We have found performance issues with regard to 
airflow.operator_failures_.*. One option is for us to drop all information 
about operator name from this metric, but we believe that such a feature in 
airflow as proposed here would allow more informative metrics while retaining 
reasonable cardinality (for example, include all operators included in Airflow 
by default).
   
   In general, this story is one step to making Airflow metrics be more 
prometheus-friendly. There are other metrics that could be considered to be 
made more prometheus-friendly, but I hope to constrain this issue to only the 
two mentioned.
   
   **Related Issues**
   
   <!-- Is there currently another issue associated with this? -->
   
   none
   
   **Alternatives**
   
   It is also possible for this to be accomplished in the statsd-exporter 
project.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to