I should add a bit of detail....

What we currently do is create a span when scheduler realizes a dag run is
running.  And then we store it in a dictionary.  Then when we detect the
dag run ends we check in the dictionary and close it if we can.

Similar for tasks.

But these spans have to deal with scheduler restarts, different schedulers
handling the same dag run and tasks etc.

I don't think it's really the way you're supposed to create spans.  And I'm
not one to say we always have to do things the "right" way.  But I don't
like the complexity it introduces and I don't see the benefit.

So in the PR I rip out all of those "active spans" dictionaries and just
create spans when it makes sense and maintain the parent child
relationships so you can still see the full flow in the end.

On Tue, Feb 17, 2026 at 8:45 AM Daniel Standish <[email protected]>
wrote:

> Hi I am looking at our OTEL stuff and I have reached the conclusion that
> we should rework it so we don't jump through hoops to keep alive very long
> spans for dag run and task.
>
> We should still have the spans, we just shouldn't jump through hoops to
> ensure that their start and end times match those in the metastore.
>
> Indeed, that's information that's always already available!
>
> I do not claim to be an OTEL expert.
>
> But intuitively we can see that the current approach is very complicated
> and confusing, and therein probably less reliable and certainly less
> maintainable.
>
> You can see what I've done so far here:
> https://github.com/apache/airflow/pull/61897
>
> Sycophantic though it may be, chat gpt seems to agree with me:
> https://chatgpt.com/share/698fea02-fa18-8013-93f7-1e26215bc3f6
>
> And it makes sense -- just make spans that will automatically be closed
> when your specific action is over.
>
> WDYT?
>
> Thanks
>

Reply via email to