nickstenning commented on code in PR #61897:
URL: https://github.com/apache/airflow/pull/61897#discussion_r2836210615


##########
airflow-core/src/airflow/serialization/definitions/dag.py:
##########
@@ -70,6 +71,18 @@
 log = structlog.get_logger(__name__)
 
 
+def add_dagrun_span(func):
+    @functools.wraps(func)
+    def wrapper(self, *args, **kwargs):
+        tracer = Trace.get_tracer("dagrun")
+        with tracer.start_as_current_span("create_dagrun") as span:

Review Comment:
   I don't think it's an either/or here. The principal audience of a span name 
is humans. For _most_ dags and _most_ tasks, I think on reflection I agree with 
@xBis7 that the right thing to do is to put the dag or task ID in the span 
name. This could get a bit messy for dynamic dags that generate 100s or 1000s 
of tasks with unhelpful names or uuids or something, but that's a choice that I 
don't think we need to worry about too much here.
   
   That said, I don't think the span name should be _just_ the task ID. It 
should be _after_ some fixed string which indicates the activity, e.g. 
`task.run my_task_name`. For comparison, see the span names added by most HTTP 
instrumentation, which look like `GET /`, `PUT /accounts/:id`, etc.
   
   **And** I would suggest having both the activity and the relevant ID in 
their own attributes in the span, e.g.
   
   ```json
   {"airflow.task.id": "my_task_name", "airflow.activity": "task.run"}
   ```
   
   Attributes are cheap. Always err on the side of having attributes.
   
   Lastly, I'd advise thinking about attribute names up front. A pattern that 
has mostly worked well for me over the years is always namespacing attribute 
names, but trying to avoid unnecessary nesting. For example: `task.id`, 
`task.operator`, `task_instance.id`, `task_instance.state`, `dag.id`. Airflow 
_might_ want to consider putting everything under an `airflow.` namespace, e.g. 
`airflow.task.id` etc.
   
   The "unnecessary nesting" thing is a bit subtle, but the point is to use the 
same attribute name for the same data _in every context_. So, for example, 
don't start doing stuff like `airflow.dag_run.task.id`, because a task id is 
also meaningful outside of the context of a dag run (e.g. the trace could be 
for an HTTP request to load the some metadata associated with that task) and so 
should live at a stable name, e.g. `airflow.task.id`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to