alamb commented on code in PR #12585:
URL: https://github.com/apache/datafusion/pull/12585#discussion_r1773544179
##########
datafusion/core/src/datasource/physical_plan/parquet/opener.rs:
##########
@@ -187,15 +190,18 @@ impl FileOpener for ParquetOpener {
}
// If there is a predicate that can be evaluated against the
metadata
if let Some(predicate) = predicate.as_ref() {
+ let mut timer = file_metrics.statistics_eval_time.timer();
row_groups.prune_by_statistics(
&file_schema,
builder.parquet_schema(),
rg_metadata,
predicate,
&file_metrics,
);
+ timer.stop();
if enable_bloom_filter && !row_groups.is_empty() {
+ let mut timer =
file_metrics.bloom_filter_eval_time.timer();
Review Comment:
I note that the time calculation for `page_index_eval_time` below seems to
be in the call to
```rust
access_plan = p.prune_plan_with_page_index(
access_plan,
&file_schema,
builder.parquet_schema(),
file_metadata.as_ref(),
&file_metrics,
);
```
I wonder if we should move the timer into this function for consistency (or
alternatively push the timer calculation into `prune_by_bloom_filters` and
`prune_by_statistics`)
##########
datafusion/core/src/datasource/physical_plan/parquet/metrics.rs:
##########
@@ -43,14 +43,20 @@ pub struct ParquetFileMetrics {
pub pushdown_rows_pruned: Count,
/// Total rows passed predicates pushed into parquet scan
pub pushdown_rows_matched: Count,
- /// Total time spent evaluating pushdown filters
- pub pushdown_eval_time: Time,
+ /// Total time spent evaluating row-level pushdown filters
Review Comment:
As an aside while I am reviewing this code, the description of what the
metrics mean would be more useful if we could attach them to the Metric itself
somehow 🤔
Something like:
```rust
let row_pushdown_eval_time = MetricBuilder::new(metrics)
.with_description("Total time spent evaluating row-level
pushdown filters")
.with_new_label("filename", filename.to_string())
.subset_time("row_pushdown_eval_time", partition);
```
Though then we would have to figure out how to expose that information in an
explain plan 🤔
##########
datafusion/core/src/datasource/physical_plan/parquet/opener.rs:
##########
@@ -187,15 +190,18 @@ impl FileOpener for ParquetOpener {
}
// If there is a predicate that can be evaluated against the
metadata
if let Some(predicate) = predicate.as_ref() {
+ let mut timer = file_metrics.statistics_eval_time.timer();
row_groups.prune_by_statistics(
&file_schema,
builder.parquet_schema(),
rg_metadata,
predicate,
&file_metrics,
);
+ timer.stop();
Review Comment:
I don't think it is a problem, but technically speaking the `stop()` is
unecessary (it happens automatically when `timer` goes out of scope and is
`Drop`ed
https://github.com/alamb/datafusion/blob/544e49bb0acac7130a873a92b44e1c902e41ac8f/datafusion/physical-plan/src/metrics/value.rs#L335-L339
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]