BasPH commented on code in PR #28907:
URL: https://github.com/apache/airflow/pull/28907#discussion_r1070102683
##########
airflow/datasets/manager.py:
##########
@@ -66,6 +67,7 @@ def register_dataset_change(
)
)
session.flush()
+ Stats.incr("dataset.changed_datasets")
Review Comment:
```suggestion
Stats.incr("dataset.updates")
```
I like this change! Metrics feel a bit like an afterthought in many changes.
I do however suggest a different name, because:
- The docs speak about "updates" instead of changes:
https://airflow.apache.org/docs/apache-airflow/stable/concepts/datasets.html,
would stick to that terminology for consistency
- Remove "datasets" because it's already in the name
##########
airflow/jobs/scheduler_job.py:
##########
@@ -1633,6 +1634,9 @@ def _orphan_unreferenced_datasets(self, session: Session
= NEW_SESSION) -> None:
)
)
)
+ orphaned_datasets_count = 0
for dataset in orphaned_dataset_query:
self.log.info("Orphaning unreferenced dataset '%s'", dataset.uri)
dataset.is_orphaned = expression.true()
+ orphaned_datasets_count += 1
+ Stats.incr("dataset.orphaned_datasets", count=orphaned_datasets_count)
Review Comment:
```suggestion
Stats.incr("dataset.orphaned", count=orphaned_datasets_count)
```
Same comment here, "dataset" is already in the metric name so would
deduplicate.
##########
docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst:
##########
@@ -121,6 +121,10 @@ Name
Descripti
fully
asynchronous)
``triggers.failed`` Number
of triggers that errored before they could fire an event
``triggers.succeeded`` Number
of triggers that have fired at least one event
+``dataset.changed_datasets`` Number
of changed datasets
+``dataset.orphaned_datasets`` Number
of datasets marked as orphans because they are no longer referenced in DAG
+
schedule parameters or task outlets
+``dataset.triggered_runs`` Number
of runs triggered by a dataset change
Review Comment:
```suggestion
``dataset.updates`` Number of
updated datasets
``dataset.orphaned`` Number of
datasets marked as orphans because they are no longer referenced in DAG
schedule parameters or task outlets
``dataset.triggered_runs``
Number of runs triggered by a dataset update
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]