eladkal commented on code in PR #38650:
URL: https://github.com/apache/airflow/pull/38650#discussion_r1546019417


##########
docs/apache-airflow/authoring-and-scheduling/datasets.rst:
##########
@@ -232,20 +232,36 @@ Attaching extra information to an emitting Dataset Event
 A task with a dataset outlet can optionally attach extra information before it 
emits a dataset event. This is different
 from `Extra information on Dataset`_. Extra information on a dataset 
statically describes the entity pointed to by the dataset URI; extra 
information on the *dataset event* instead should be used to annotate the 
triggering data change, such as how many rows in the database are changed by 
the update, or the date range covered by it.
 
-The easiest way to attach extra information to the dataset event is by 
accessing ``dataset_events`` in a task's execution context:
+The easiest way to attach extra information to the dataset event is by 
``yield``-ing a ``Metadata`` object from a task:
 
 .. code-block:: python
 
+    from airflow.datasets import Dataset
+    from airflow.datasets.metadata import Metadata
+
     example_s3_dataset = Dataset("s3://dataset/example.csv")
 
 
     @task(outlets=[example_s3_dataset])
-    def write_to_s3(*, dataset_events):
+    def write_to_s3():
         df = ...  # Get a Pandas DataFrame to write.
         # Write df to dataset...
+        yield Metadata(example_s3_dataset, {"row_count": len(df)})
+
+Airflow automatically collects all yielded metadata, and populates dataset 
events with extra information for corresponding metadata objects.
+
+This can also be done in classic operators. The best way is to subclass the 
operator and override ``execute``. Alternatively, extras can also be added in a 
task's ``pre_execute`` or ``post_execute`` hook.

Review Comment:
   pre and post don't have retry mechanism so I assume there can be cases where 
not all data collected. So there are pros and cons if users decided to do it in 
execute() or in pre/post
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to