erratic-pattern commented on issue #9415:
URL: https://github.com/apache/datafusion/issues/9415#issuecomment-2635477969
> I am not clear what additional benefit more direct tracing integration in
datafusion would provide, but I may be missing something
The `tracing` API is more granular than the information provided by the
current metrics, so while it's possible to convert the existing metrics into
`tracing` spans, there is some information that is inaccessible or impossible
to trace at the moment.
The main example I can think of is the exact timing of entering/exiting
execution of the async tasks. The datafusion metrics record a "start" and "end"
timestamp for the whole operator, but they do not record when operators await
and give up control to the executor.
The tracing API allows for this because a span can be entered and exited
multiple times before it is finally closed. This allows you to graph out
exactly when async tasks are running in relation to each and for how long,
which can be helpful for identifying bottlenecks where a task is waiting a long
time for data from another task.
You can kinda use the `elapsed_compute` metric for this purpose, if you are
only interested in identifying which operators are slow and fast, but the added
granularity that you get from `tracing` would make it easier to visualize the
actual path that control flow takes when tasks are pre-empted.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]