BlakeOrth commented on issue #17207:
URL: https://github.com/apache/datafusion/issues/17207#issuecomment-3207616233

   I'm going to do my best to move this forward in hopes that it helps enable 
us to get positive movement on various issues noted in #17214. I've already 
stubbed out a very basic implementation that suggests completing this with the 
implementation suggestion is possible. However, as expected, there are some 
subtle complexities that I ran into that weren't immediately obvious.
   
   The easier of the two items to address is providing access to the 
instrumented object store(s) for each statement execution. Unfortunately, 
[`ObjectStore`](https://docs.rs/object_store/0.12.3/object_store/trait.ObjectStore.html#)
 does not implement `Any`, so there's no way to coerce it back to a concrete 
implementation. The same applies to the 
[`ObjectStoreRegistry`](https://docs.rs/datafusion/latest/datafusion/datasource/object_store/trait.ObjectStoreRegistry.html).
 I believe this more or less forces us to keep access to the concrete 
implementations (barring using unsafe operations such as `mem::transmute`). The 
implementation can stay entirely focused in `datafusion-cli` if we do something 
like thread a variable through all of the various `exec_` functions. It's not 
terribly elegant, but it is functional. I think there's probably also a 
solution that leverages a 
[`LazyLock`](https://doc.rust-lang.org/beta/std/sync/struct.LazyLock.html) if 
we think it would be preferable to keep 
 it as a global variable.
   
   The more difficult item to address is tracking the duration of any 
`ObjectStore` trait method that returns a `BoxStream` instead of a standard 
future. I think the key method that's an issue here is `list()` as it's a 
method I know DataFusion leverages every query when using a ListingTable. The 
"start time" for a stream operation shouldn't start until an item on the stream 
is awaited. There's no reasonable duration to track until the futures in the 
stream are awaited. I believe it should be possible to wrap the futures in the 
stream in an instrumented future, and in theory I think we could obtain a 
reasonable duration by keeping track of the amount of time a future spends in 
`Poll::Pending`, but this is substantially more complex than the simple 
instrumentation that can be applied to most methods.
   
   Does anyone have any recommendations or guidance on the desired path forward 
here?
   
   cc @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to