BlakeOrth opened a new pull request, #18045:
URL: https://github.com/apache/datafusion/pull/18045
## Which issue does this PR close?
This does not fully close, but is an incremental building block component
for:
- https://github.com/apache/datafusion/issues/17207
The full context of how this code is likely to progress can be seen in the
POC for this effort:
- https://github.com/apache/datafusion/pull/17266
## Rationale for this change
For particularly large requests, in terms of number of objects in a table or
large objects, the number of operations for a query may be quite large. In
these cases, understanding the aggregate impact of various object store
operations is likely the best way to understand the impact those operations had
on a particular query. This PR allows users of an instrumented object store to
understand and display basic summary statistics related to the `RequestDetails`
collected during a query.
## What changes are included in this PR?
- Adds a `RequestSummary` type for the instrumented object store to display
summary statistics about instrumented requests
- Adds a generic Stats type to track the statistics for the summary
- Adds tests for the new code
- Adds a basic summary output to the user-facing display when profiling is
enabled
- Adds docs for new and newly exported public items
## Are these changes tested?
Yes. The new functionality has tests implemented, aside from testing the
actual display output. The functional output can be seen below:
```sql
DataFusion CLI v50.1.0
> \object_store_profiling enabled
ObjectStore Profile mode set to Enabled
> CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION
'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
0 row(s) fetched.
Elapsed 0.268 seconds.
Object Store Profiling
Instrumented Object Store: instrument_mode: Enabled, inner: HttpStore
2025-10-13T22:15:50.518465131+00:00 operation=Get duration=0.030742s size=8
range: bytes=174965036-174965043
path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-13T22:15:50.549263341+00:00 operation=Get duration=0.033060s
size=34322 range: bytes=174930714-174965035
path=hits_compatible/athena_partitioned/hits_1.parquet
Summaries:
Get
count: 2
duration min: 0.030742s
duration max: 0.033060s
duration avg: 0.031901s
size min: 8 B
size max: 34322 B
size avg: 17165 B
size sum: 34330 B
>
```
## Are there any user-facing changes?
Yes? Just like the previous PR this does change the user-facing output, but
there's no API breaking changes.
cc @alamb
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]