waynr commented on issue #7135: URL: https://github.com/apache/arrow-rs/issues/7135#issuecomment-2666472741
> Currently the tracing for DataFusion is constructed based on the timing metrics collected during execution. This is itself an approximation, e.g. it aggregates across partitions, to keep trace size manageable. I'm not entirely sure what you mean here, do you have a code example I could look at to get a better understanding? I hadn't seen any explicit tracing/span reporting anywhere in datafusion (but I also haven't spent much time digging around the code yet). > I suspect this is inevitable unless tracing were integrated into DataFusion as a first-class concept. I think we can avoid tracing as a first-class concept in Datafusion and get properly-parented spans. One way to do that is what I've proposed in this issue, but I understand it's a big ask with some maybe unpalatable complexity. Another way that should work in the contexts I care about that I was just chatting with @crepererum this morning would be to update [`GetOptions`](https://docs.rs/object_store/latest/object_store/struct.GetOptions.html) to accept custom metadata so that we can pass parent span IDs in as strings. The only thing I'm not certain about with that approach is whether the parquet file access that we care about in influxdb3 are actually using `get_range` or `get_ranges`, which don't take `GetOptions` structs. Regardless, @tustvold would you like me to file a separate issue to propose that? For what it's worth I am also looking into trying out a per-query wrapper around the `MemCachedObjectStore` that I've mentioned previously such that we can at least get query-level trace spans but I'm not confident that it's possible without losing some of the benefits of the current process-global `Arc<dyn ObjectStore>` that we use to initialize various datafusion and iox components. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org