AutomationDev85 opened a new pull request, #46510:
URL: https://github.com/apache/airflow/pull/46510

   # Overview
   Hi Airflow community,
   
   I´m currently trying to get Airflow metrics exported via OTEL -> Prometheus 
into Grafana Dashboards. I wanted to use the label advantage of OTEL to create 
nice Dashboards. 
   
   During the implementation I found an issue and will use metric 
airflow_dag_processing_last_duration to explain the details:
   
   This metric is exported in 2 different ways in the Airflow code, to support 
statsd way and otel way with labels.
   Stats.timing(f"dag_processing.last_duration.{file_name}", stat.last_duration)
   Stats.timing("dag_processing.last_duration", stat.last_duration, 
tags={"file_name": file_name})
   
   The following prometheus example contains metric export for 2 dags (dag1, 
dag2). But the metric with the label (Otel) does only export the metric for 1 
dag. Metric export with file_name in the metric name is exported for 2 dags.
   
   # HELP airflow_dag_processing_last_duration
   # TYPE airflow_dag_processing_last_duration gauge
   airflow_dag_processing_last_duration{file_name="dag1",job="Airflow"} 0.293856
   
   # HELP airflow_dag_processing_last_duration_dag1
   # TYPE airflow_dag_processing_last_duration_dag1  gauge
   airflow_dag_processing_last_duration_dag1 {job="Airflow"} 0.293856
   # HELP airflow_dag_processing_last_duration_blabla2
   # TYPE airflow_dag_processing_last_duration_blabla2 gauge
   airflow_dag_processing_last_duration_dag2{job="Airflow"} 0.343803
   
   I would expect that the metric is also exported like this:
   airflow_dag_processing_last_duration{file_name="dag2",job="Airflow"} 0.343803
   
   
   So I debugged some time and found out that this issue is only related to the 
gauge export. If the metric is a counter the label export works fine.
   
   The issue is that the gauge value is created as an ObserveableGauge and with 
that OTEL uses a callback to collect the metric. For this OTEL python lib 
creates intruments to handle the metric. The down side of this is that if a 
second Observable instrument with the same metric name is created, OTEL will 
only create one instrument because it checks for the metric name. 
   This results in the issue that only the callback for the first registered 
metric will be executed and all other metrics with different label but same 
name will be ignored.
   My idea is now to switch to an syncronos gauge export of the metric like it 
is used for the counter export. 
   
   I´m not sure why the ObserveableGauge was used but I did not found a 
solution to fix the issue without switching to sync gauge export.
   Also not sure about any down side of using sync gauge, like maybe runtime, 
but for the counter metric export sync was also used. Anyone has more know how 
in that area and give feedback for this?
   
   # Details of changes:
   * Use of normal sync Gauge instead of ObservedGauge.
   * Moved logic to handle gauge into InternalGauge class.
   
   Looking forward to fix this issue and the feedback from your side!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to