BlakeOrth opened a new issue, #18232:
URL: https://github.com/apache/datafusion/issues/18232
### Is your feature request related to a problem or challenge?
As noted in the comment chain here:
- https://github.com/apache/datafusion/pull/18139#discussion_r2440968965
The duration statistic reported by some of the instrumented object store's
methods, while technically accurate, can potentially be misleading for users.
E.g. the duration reported for a `put_multipart` is the duration the backing
object store spent initiating a multipart put session with the backing store,
as opposed to the duration actually spent pushing data to the backing store.
Users would likely expect the duration to be the latter since that's the
portion of the process where actual "work" with the backing store is being
done. Additionally, any duration based caveats are not readily apparent without
understanding both the instrumentation code in `datafusion` as well as some
understanding of how operations work in `object_store`.
Considering the instrumented object store is currently mostly a
development/debug utility the above caveats are likely tolerable, however
improving/scrutinizing the accounting for the collected and reported durations
would allow the instrumented object store to be more useful in profiling
operations that are strictly focused on runtime duration of operations.
### Describe the solution you'd like
I would like to have additional logic added to the instrumented object store
that helps the duration statistics that are collected and reported to be in
line with an end-user's expectations.
### Describe alternatives you've considered
If the goal is just to make sure the duration stats that are reported are
not misleading duration could be omitted from various operations (and
subsequently accounted for when computing summary statistics). This would help
the reported statistics not be misleading, but it would also reduce the
granularity of reporting which seems somewhat undesirable.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]