[GitHub] [airflow] howardyoo commented on a diff in pull request #30496: [OTel Integration] Add tagging to existing stats

via GitHub Wed, 12 Apr 2023 21:01:14 -0700


howardyoo commented on code in PR #30496:
URL: https://github.com/apache/airflow/pull/30496#discussion_r1163519181



##########
airflow/dag_processing/processor.py:
##########
@@ -534,7 +535,9 @@ def manage_slas(cls, dag_folder, dag_id: str, session: 
Session = NEW_SESSION) ->
                     email_sent = True
                     notification_sent = True
                 except Exception:
-                    Stats.incr("sla_email_notification_failure", 
tags={"dag_id": dag.dag_id})
+                    Stats.incr(
+                        "sla_email_notification_failure", tags={"dag_id": 
dag.dag_id, "email": str(emails)}

Review Comment:
   Hey @ferruzzi , I think we can remove the actual email value from the 
metric. I do agree with @vandonr-amz , that the email values could introduce 
the unnecessary cardinality, but also it should be easy to track its name via 
the provided `dag_id`.



##########
airflow/models/taskinstance.py:
##########
@@ -540,6 +540,10 @@ def __init__(
         # can be changed when calling 'run'
         self.test_mode = False
 
+    @property
+    def stats_tags(self) -> dict[str, str]:
+        return prune_dict({"dag_id": self.dag_id, "run_id": str(self.run_id), 
"task_id": self.task_id})

Review Comment:
   Even though this may be a breaking change, after thinking about it, I 
believe adding run_id was not a good design decision as the presence of that 
label would not create a continuous time series, but rather a multiple metrics 
data that are going to become high-cardinality pattern.
   
   My usual thought process for determining if something is a metric data or 
not, is whether the series could be something that we could connect it as a 
'time series' (connecting dots to make it into the chart) or not. Having each 
data point with distinct run_id would not make it into time series, but just a 
series of distinct individual 'events' that would simply be plotted as dots in 
the chart (since each of them would have their own distinct run_id).



##########
airflow/models/taskinstance.py:
##########
@@ -540,6 +540,10 @@ def __init__(
         # can be changed when calling 'run'
         self.test_mode = False
 
+    @property
+    def stats_tags(self) -> dict[str, str]:
+        return prune_dict({"dag_id": self.dag_id, "run_id": str(self.run_id), 
"task_id": self.task_id})

Review Comment:
   It is highly likely that the str() isn't necessary for self.run_id - as it 
should be of string type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] howardyoo commented on a diff in pull request #30496: [OTel Integration] Add tagging to existing stats

Reply via email to