howardyoo commented on code in PR #30496:
URL: https://github.com/apache/airflow/pull/30496#discussion_r1163519181
##########
airflow/dag_processing/processor.py:
##########
@@ -534,7 +535,9 @@ def manage_slas(cls, dag_folder, dag_id: str, session:
Session = NEW_SESSION) ->
email_sent = True
notification_sent = True
except Exception:
- Stats.incr("sla_email_notification_failure",
tags={"dag_id": dag.dag_id})
+ Stats.incr(
+ "sla_email_notification_failure", tags={"dag_id":
dag.dag_id, "email": str(emails)}
Review Comment:
Hey @ferruzzi , I think we can remove the actual email value from the
metric. I do agree with @vandonr-amz , that the email values could introduce
the unnecessary cardinality, but also it should be easy to track its name via
the provided `dag_id`.
##########
airflow/models/taskinstance.py:
##########
@@ -540,6 +540,10 @@ def __init__(
# can be changed when calling 'run'
self.test_mode = False
+ @property
+ def stats_tags(self) -> dict[str, str]:
+ return prune_dict({"dag_id": self.dag_id, "run_id": str(self.run_id),
"task_id": self.task_id})
Review Comment:
Even though this may be a breaking change, after thinking about it, I
believe adding run_id was not a good design decision as the presence of that
label would not create a continuous time series, but rather a multiple metrics
data that are going to become high-cardinality pattern.
My usual thought process for determining if something is a metric data or
not, is whether the series could be something that we could connect it as a
'time series' (connecting dots to make it into the chart) or not. Having each
data point with distinct run_id would not make it into time series, but just a
series of distinct individual 'events' that would simply be plotted as dots in
the chart (since each of them would have their own distinct run_id).
##########
airflow/models/taskinstance.py:
##########
@@ -540,6 +540,10 @@ def __init__(
# can be changed when calling 'run'
self.test_mode = False
+ @property
+ def stats_tags(self) -> dict[str, str]:
+ return prune_dict({"dag_id": self.dag_id, "run_id": str(self.run_id),
"task_id": self.task_id})
Review Comment:
It is highly likely that the str() isn't necessary for self.run_id - as it
should be of string type.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]