Hi Daniel,

I was the author of the current OTel spans approach and I would like to
share some context.

The spans for the dag_run and the tasks aren't adding much value because
all the info is already available in the web server. The main benefit is
the ability to add sub-spans from under tasks and monitor individual steps
or external operations. That's why it was designed this way.

You need the task span to stay active long enough to get its context and
propagate it under the task. After that you can use it to create sub-spans.
If the parent span isn't active while you are creating the sub-span, the
visual result won't make much sense.

Initially, I had an idea to create a short span in the beginning of the
task and then a short span at the end, but it doesn't look good when there
are multiple sub-spans and it's hard to figure out what is the order of the
steps and their placing in the dag_run.

*Possible tree*
dag_run    [------------------------------------------]
task_1        [-----]
task_2                [----------------------------------]
task_2_1                [--------]
task_2_2                           [----]
task_2_3                                 [----------------]

The implementation ended up being quite complex due to scheduler HA. The
span objects can't be shared outside of the process that created them, they
are thread local. One dag_run can be so long-running that the scheduler
that started processing it, isn't the one that marks it as finished. The
benefit of having an active span for each scheduler that keeps track of a
dag_run is to have the ability to get a clear picture of individual task
steps and get continuous observability.

I'm not trying to point out that it shouldn't be simplified if possible.
When I contributed these changes, Airflow looked very different from what
it is now. For example, the task-sdk didn't exist and tasks had direct DB
access. If for some reason, the task span ended and a new span was started,
you could get the context_carrier directly from the db and use it to make
the new span as the parent of any future sub-spans. That's not possible
anymore.

I'm open to all ideas. I just wanted to explain the complexity.

Christos

On Tue, Feb 17, 2026 at 6:53 PM Daniel Standish <[email protected]>
wrote:

> I should add a bit of detail....
>
> What we currently do is create a span when scheduler realizes a dag run is
> running.  And then we store it in a dictionary.  Then when we detect the
> dag run ends we check in the dictionary and close it if we can.
>
> Similar for tasks.
>
> But these spans have to deal with scheduler restarts, different schedulers
> handling the same dag run and tasks etc.
>
> I don't think it's really the way you're supposed to create spans.  And I'm
> not one to say we always have to do things the "right" way.  But I don't
> like the complexity it introduces and I don't see the benefit.
>
> So in the PR I rip out all of those "active spans" dictionaries and just
> create spans when it makes sense and maintain the parent child
> relationships so you can still see the full flow in the end.
>
> On Tue, Feb 17, 2026 at 8:45 AM Daniel Standish <[email protected]>
> wrote:
>
> > Hi I am looking at our OTEL stuff and I have reached the conclusion that
> > we should rework it so we don't jump through hoops to keep alive very
> long
> > spans for dag run and task.
> >
> > We should still have the spans, we just shouldn't jump through hoops to
> > ensure that their start and end times match those in the metastore.
> >
> > Indeed, that's information that's always already available!
> >
> > I do not claim to be an OTEL expert.
> >
> > But intuitively we can see that the current approach is very complicated
> > and confusing, and therein probably less reliable and certainly less
> > maintainable.
> >
> > You can see what I've done so far here:
> > https://github.com/apache/airflow/pull/61897
> >
> > Sycophantic though it may be, chat gpt seems to agree with me:
> > https://chatgpt.com/share/698fea02-fa18-8013-93f7-1e26215bc3f6
> >
> > And it makes sense -- just make spans that will automatically be closed
> > when your specific action is over.
> >
> > WDYT?
> >
> > Thanks
> >
>

Reply via email to