xudong963 opened a new pull request, #22406:
URL: https://github.com/apache/datafusion/pull/22406

   ## Which issue does this PR close?
   
   - Part of #22189.
   
   ## Rationale for this change
   
   `ParquetFileMetrics::new` registers many per-file metrics with the same
   `filename` label. Before this PR, each metric built its own owned filename
   label with `filename.to_string()`, which repeatedly copied the same dynamic
   string during parquet scan setup.
   
   This PR keeps parquet metrics eagerly registered, so 
`ExecutionPlan::metrics()`
   visibility during execution is unchanged, while reducing repeated label 
string
   allocation and copying.
   
   ## What changes are included in this PR?
   
   - Store owned `Label` name/value strings behind `Arc<str>` internally, while
     keeping borrowed static label strings allocation-free.
   - Reuse one cloned `filename` label across the per-file parquet metrics in
     `ParquetFileMetrics::new`.
   - Add a metrics test confirming borrowed and owned label values remain equal
     and display the same way.
   
   ## Are these changes tested?
   
   Yes.
   
   ```text
   cargo fmt --all
   cargo test -p datafusion-physical-expr-common metrics::tests
   cargo test -p datafusion-datasource-parquet --lib
   cargo clippy -p datafusion-physical-expr-common -p 
datafusion-datasource-parquet --lib -- -D warnings
   cargo clippy --all-targets --all-features -- -D warnings
   git diff --check
   ```
   
   I also ran a local targeted microbenchmark for repeated
   `ParquetFileMetrics::new` construction:
   
   ```text
   origin/main, 50k iterations x 9 samples:
   median = 66.223 ms
   rerun median = 67.423 ms
   
   this PR, 50k iterations x 9 samples:
   median = 59.283 ms
   ```
   
   That is about 10-12% faster for this targeted metric construction path.
   
   ## Are there any user-facing changes?
   
   No. Metric registration timing and displayed label values are unchanged.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to