BlakeOrth commented on issue #17207: URL: https://github.com/apache/datafusion/issues/17207#issuecomment-3207616233
I'm going to do my best to move this forward in hopes that it helps enable us to get positive movement on various issues noted in #17214. I've already stubbed out a very basic implementation that suggests completing this with the implementation suggestion is possible. However, as expected, there are some subtle complexities that I ran into that weren't immediately obvious. The easier of the two items to address is providing access to the instrumented object store(s) for each statement execution. Unfortunately, [`ObjectStore`](https://docs.rs/object_store/0.12.3/object_store/trait.ObjectStore.html#) does not implement `Any`, so there's no way to coerce it back to a concrete implementation. The same applies to the [`ObjectStoreRegistry`](https://docs.rs/datafusion/latest/datafusion/datasource/object_store/trait.ObjectStoreRegistry.html). I believe this more or less forces us to keep access to the concrete implementations (barring using unsafe operations such as `mem::transmute`). The implementation can stay entirely focused in `datafusion-cli` if we do something like thread a variable through all of the various `exec_` functions. It's not terribly elegant, but it is functional. I think there's probably also a solution that leverages a [`LazyLock`](https://doc.rust-lang.org/beta/std/sync/struct.LazyLock.html) if we think it would be preferable to keep it as a global variable. The more difficult item to address is tracking the duration of any `ObjectStore` trait method that returns a `BoxStream` instead of a standard future. I think the key method that's an issue here is `list()` as it's a method I know DataFusion leverages every query when using a ListingTable. The "start time" for a stream operation shouldn't start until an item on the stream is awaited. There's no reasonable duration to track until the futures in the stream are awaited. I believe it should be possible to wrap the futures in the stream in an instrumented future, and in theory I think we could obtain a reasonable duration by keeping track of the amount of time a future spends in `Poll::Pending`, but this is substantially more complex than the simple instrumentation that can be applied to most methods. Does anyone have any recommendations or guidance on the desired path forward here? cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org