waynr commented on issue #7135:
URL: https://github.com/apache/arrow-rs/issues/7135#issuecomment-2666472741

   > Currently the tracing for DataFusion is constructed based on the timing 
metrics collected during execution. This is itself an approximation, e.g. it 
aggregates across partitions, to keep trace size manageable.
   
   I'm not entirely sure what you mean here, do you have a code example I could 
look at to get a better understanding? I hadn't seen any explicit tracing/span 
reporting anywhere in datafusion (but I also haven't spent much time digging 
around the code yet).
   
   > I suspect this is inevitable unless tracing were integrated into 
DataFusion as a first-class concept.
   
   I think we can avoid tracing as a first-class concept in Datafusion and get 
properly-parented spans. One way to do that is what I've proposed in this 
issue, but I understand it's a big ask with some maybe unpalatable complexity. 
   
   Another way that should work in the contexts I care about that I was just 
chatting with @crepererum this morning would be to update 
[`GetOptions`](https://docs.rs/object_store/latest/object_store/struct.GetOptions.html)
 to accept custom metadata so that we can pass parent span IDs in as strings.
   
   The only thing I'm not certain about with that approach is whether the 
parquet file access that we care about in influxdb3 are actually using 
`get_range` or `get_ranges`, which don't take `GetOptions` structs. Regardless, 
@tustvold would you like me to file a separate issue to propose that?
   
   For what it's worth I am also looking into trying out a per-query wrapper 
around the `MemCachedObjectStore` that I've mentioned previously such that we 
can at least get query-level trace spans but I'm not confident that it's 
possible without losing some of the benefits of the current process-global 
`Arc<dyn ObjectStore>` that we use to initialize various datafusion and iox 
components.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to