alamb commented on code in PR #12585:
URL: https://github.com/apache/datafusion/pull/12585#discussion_r1773544179


##########
datafusion/core/src/datasource/physical_plan/parquet/opener.rs:
##########
@@ -187,15 +190,18 @@ impl FileOpener for ParquetOpener {
             }
             // If there is a predicate that can be evaluated against the 
metadata
             if let Some(predicate) = predicate.as_ref() {
+                let mut timer = file_metrics.statistics_eval_time.timer();
                 row_groups.prune_by_statistics(
                     &file_schema,
                     builder.parquet_schema(),
                     rg_metadata,
                     predicate,
                     &file_metrics,
                 );
+                timer.stop();
 
                 if enable_bloom_filter && !row_groups.is_empty() {
+                    let mut timer = 
file_metrics.bloom_filter_eval_time.timer();

Review Comment:
   I note that the time calculation for `page_index_eval_time` below seems to 
be in the call to
   
   ```rust
                       access_plan = p.prune_plan_with_page_index(
                           access_plan,
                           &file_schema,
                           builder.parquet_schema(),
                           file_metadata.as_ref(),
                           &file_metrics,
                       );
   ```
   
   I wonder if we should move the timer into this function for consistency (or 
alternatively push the timer calculation into `prune_by_bloom_filters` and 
`prune_by_statistics`)



##########
datafusion/core/src/datasource/physical_plan/parquet/metrics.rs:
##########
@@ -43,14 +43,20 @@ pub struct ParquetFileMetrics {
     pub pushdown_rows_pruned: Count,
     /// Total rows passed predicates pushed into parquet scan
     pub pushdown_rows_matched: Count,
-    /// Total time spent evaluating pushdown filters
-    pub pushdown_eval_time: Time,
+    /// Total time spent evaluating row-level pushdown filters

Review Comment:
   As an aside while I am reviewing this code, the description of what the 
metrics mean would be more useful if we could attach them to the Metric itself 
somehow 🤔 
   
   Something like:
   ```rust
           let row_pushdown_eval_time = MetricBuilder::new(metrics)
               .with_description("Total time spent evaluating row-level 
pushdown filters")
               .with_new_label("filename", filename.to_string()) 
               .subset_time("row_pushdown_eval_time", partition);
   ```
   
   Though then we would have to figure out how to expose that information in an 
explain plan 🤔 
   
   



##########
datafusion/core/src/datasource/physical_plan/parquet/opener.rs:
##########
@@ -187,15 +190,18 @@ impl FileOpener for ParquetOpener {
             }
             // If there is a predicate that can be evaluated against the 
metadata
             if let Some(predicate) = predicate.as_ref() {
+                let mut timer = file_metrics.statistics_eval_time.timer();
                 row_groups.prune_by_statistics(
                     &file_schema,
                     builder.parquet_schema(),
                     rg_metadata,
                     predicate,
                     &file_metrics,
                 );
+                timer.stop();

Review Comment:
   I don't think it is a problem, but technically speaking the `stop()` is 
unecessary (it happens automatically when `timer` goes out of scope and is 
`Drop`ed
   
   
https://github.com/alamb/datafusion/blob/544e49bb0acac7130a873a92b44e1c902e41ac8f/datafusion/physical-plan/src/metrics/value.rs#L335-L339



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to