Thanks Christos Please take a look at the screenshots in my pr
You can see what it looks like I am able to use the parent dag run and task spans even down into task sdk The only thing you don’t get is a green bar that lasts the whole time but to me this doesn’t really matter Please take a look when you have a chance. I think it’s a good approach with very small trade off. On Tue, Feb 17, 2026 at 9:37 AM Christos Bisias <[email protected]> wrote: > Hi Daniel, > > I was the author of the current OTel spans approach and I would like to > share some context. > > The spans for the dag_run and the tasks aren't adding much value because > all the info is already available in the web server. The main benefit is > the ability to add sub-spans from under tasks and monitor individual steps > or external operations. That's why it was designed this way. > > You need the task span to stay active long enough to get its context and > propagate it under the task. After that you can use it to create sub-spans. > If the parent span isn't active while you are creating the sub-span, the > visual result won't make much sense. > > Initially, I had an idea to create a short span in the beginning of the > task and then a short span at the end, but it doesn't look good when there > are multiple sub-spans and it's hard to figure out what is the order of the > steps and their placing in the dag_run. > > *Possible tree* > dag_run [------------------------------------------] > task_1 [-----] > task_2 [----------------------------------] > task_2_1 [--------] > task_2_2 [----] > task_2_3 [----------------] > > The implementation ended up being quite complex due to scheduler HA. The > span objects can't be shared outside of the process that created them, they > are thread local. One dag_run can be so long-running that the scheduler > that started processing it, isn't the one that marks it as finished. The > benefit of having an active span for each scheduler that keeps track of a > dag_run is to have the ability to get a clear picture of individual task > steps and get continuous observability. > > I'm not trying to point out that it shouldn't be simplified if possible. > When I contributed these changes, Airflow looked very different from what > it is now. For example, the task-sdk didn't exist and tasks had direct DB > access. If for some reason, the task span ended and a new span was started, > you could get the context_carrier directly from the db and use it to make > the new span as the parent of any future sub-spans. That's not possible > anymore. > > I'm open to all ideas. I just wanted to explain the complexity. > > Christos > > On Tue, Feb 17, 2026 at 6:53 PM Daniel Standish <[email protected]> > wrote: > > > I should add a bit of detail.... > > > > What we currently do is create a span when scheduler realizes a dag run > is > > running. And then we store it in a dictionary. Then when we detect the > > dag run ends we check in the dictionary and close it if we can. > > > > Similar for tasks. > > > > But these spans have to deal with scheduler restarts, different > schedulers > > handling the same dag run and tasks etc. > > > > I don't think it's really the way you're supposed to create spans. And > I'm > > not one to say we always have to do things the "right" way. But I don't > > like the complexity it introduces and I don't see the benefit. > > > > So in the PR I rip out all of those "active spans" dictionaries and just > > create spans when it makes sense and maintain the parent child > > relationships so you can still see the full flow in the end. > > > > On Tue, Feb 17, 2026 at 8:45 AM Daniel Standish <[email protected]> > > wrote: > > > > > Hi I am looking at our OTEL stuff and I have reached the conclusion > that > > > we should rework it so we don't jump through hoops to keep alive very > > long > > > spans for dag run and task. > > > > > > We should still have the spans, we just shouldn't jump through hoops to > > > ensure that their start and end times match those in the metastore. > > > > > > Indeed, that's information that's always already available! > > > > > > I do not claim to be an OTEL expert. > > > > > > But intuitively we can see that the current approach is very > complicated > > > and confusing, and therein probably less reliable and certainly less > > > maintainable. > > > > > > You can see what I've done so far here: > > > https://github.com/apache/airflow/pull/61897 > > > > > > Sycophantic though it may be, chat gpt seems to agree with me: > > > https://chatgpt.com/share/698fea02-fa18-8013-93f7-1e26215bc3f6 > > > > > > And it makes sense -- just make spans that will automatically be closed > > > when your specific action is over. > > > > > > WDYT? > > > > > > Thanks > > > > > >
