BasPH commented on code in PR #28907:
URL: https://github.com/apache/airflow/pull/28907#discussion_r1070102683


##########
airflow/datasets/manager.py:
##########
@@ -66,6 +67,7 @@ def register_dataset_change(
             )
         )
         session.flush()
+        Stats.incr("dataset.changed_datasets")

Review Comment:
   ```suggestion
           Stats.incr("dataset.updates")
   ```
   
   I like this change! Metrics feel a bit like an afterthought in many changes. 
I do however suggest a different name, because:
   
   - The docs speak about "updates" instead of changes: 
https://airflow.apache.org/docs/apache-airflow/stable/concepts/datasets.html, 
would stick to that terminology for consistency
   - Remove "datasets" because it's already in the name



##########
airflow/jobs/scheduler_job.py:
##########
@@ -1633,6 +1634,9 @@ def _orphan_unreferenced_datasets(self, session: Session 
= NEW_SESSION) -> None:
                 )
             )
         )
+        orphaned_datasets_count = 0
         for dataset in orphaned_dataset_query:
             self.log.info("Orphaning unreferenced dataset '%s'", dataset.uri)
             dataset.is_orphaned = expression.true()
+            orphaned_datasets_count += 1
+        Stats.incr("dataset.orphaned_datasets", count=orphaned_datasets_count)

Review Comment:
   ```suggestion
           Stats.incr("dataset.orphaned", count=orphaned_datasets_count)
   ```
   
   Same comment here, "dataset" is already in the metric name so would 
deduplicate.



##########
docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst:
##########
@@ -121,6 +121,10 @@ Name                                                       
            Descripti
                                                                        fully 
asynchronous)
 ``triggers.failed``                                                    Number 
of triggers that errored before they could fire an event
 ``triggers.succeeded``                                                 Number 
of triggers that have fired at least one event
+``dataset.changed_datasets``                                           Number 
of changed datasets
+``dataset.orphaned_datasets``                                          Number 
of datasets marked as orphans because they are no longer referenced in DAG
+                                                                       
schedule parameters or task outlets
+``dataset.triggered_runs``                                             Number 
of runs triggered by a dataset change

Review Comment:
   ```suggestion
   ``dataset.updates``                                           Number of 
updated datasets
   ``dataset.orphaned``                                          Number of 
datasets marked as orphans because they are no longer referenced in DAG
                                                                          
schedule parameters or task outlets
   ``dataset.triggered_runs``                                             
Number of runs triggered by a dataset update
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to