alamb commented on code in PR #12585: URL: https://github.com/apache/datafusion/pull/12585#discussion_r1773544179
########## datafusion/core/src/datasource/physical_plan/parquet/opener.rs: ########## @@ -187,15 +190,18 @@ impl FileOpener for ParquetOpener { } // If there is a predicate that can be evaluated against the metadata if let Some(predicate) = predicate.as_ref() { + let mut timer = file_metrics.statistics_eval_time.timer(); row_groups.prune_by_statistics( &file_schema, builder.parquet_schema(), rg_metadata, predicate, &file_metrics, ); + timer.stop(); if enable_bloom_filter && !row_groups.is_empty() { + let mut timer = file_metrics.bloom_filter_eval_time.timer(); Review Comment: I note that the time calculation for `page_index_eval_time` below seems to be in the call to ```rust access_plan = p.prune_plan_with_page_index( access_plan, &file_schema, builder.parquet_schema(), file_metadata.as_ref(), &file_metrics, ); ``` I wonder if we should move the timer into this function for consistency (or alternatively push the timer calculation into `prune_by_bloom_filters` and `prune_by_statistics`) ########## datafusion/core/src/datasource/physical_plan/parquet/metrics.rs: ########## @@ -43,14 +43,20 @@ pub struct ParquetFileMetrics { pub pushdown_rows_pruned: Count, /// Total rows passed predicates pushed into parquet scan pub pushdown_rows_matched: Count, - /// Total time spent evaluating pushdown filters - pub pushdown_eval_time: Time, + /// Total time spent evaluating row-level pushdown filters Review Comment: As an aside while I am reviewing this code, the description of what the metrics mean would be more useful if we could attach them to the Metric itself somehow 🤔 Something like: ```rust let row_pushdown_eval_time = MetricBuilder::new(metrics) .with_description("Total time spent evaluating row-level pushdown filters") .with_new_label("filename", filename.to_string()) .subset_time("row_pushdown_eval_time", partition); ``` Though then we would have to figure out how to expose that information in an explain plan 🤔 ########## datafusion/core/src/datasource/physical_plan/parquet/opener.rs: ########## @@ -187,15 +190,18 @@ impl FileOpener for ParquetOpener { } // If there is a predicate that can be evaluated against the metadata if let Some(predicate) = predicate.as_ref() { + let mut timer = file_metrics.statistics_eval_time.timer(); row_groups.prune_by_statistics( &file_schema, builder.parquet_schema(), rg_metadata, predicate, &file_metrics, ); + timer.stop(); Review Comment: I don't think it is a problem, but technically speaking the `stop()` is unecessary (it happens automatically when `timer` goes out of scope and is `Drop`ed https://github.com/alamb/datafusion/blob/544e49bb0acac7130a873a92b44e1c902e41ac8f/datafusion/physical-plan/src/metrics/value.rs#L335-L339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org