alamb commented on code in PR #22406:
URL: https://github.com/apache/datafusion/pull/22406#discussion_r3280709122


##########
datafusion/datasource-parquet/src/metrics.rs:
##########
@@ -100,46 +100,44 @@ impl ParquetFileMetrics {
         filename: &str,
         metrics: &ExecutionPlanMetricsSet,
     ) -> Self {
+        // Share the filename label across all per-file metrics to avoid
+        // allocating the same filename string for each metric.
+        let filename_label = Label::new("filename", filename.to_string());
+        let builder = || 
MetricBuilder::new(metrics).with_label(filename_label.clone());

Review Comment:
   Rather than an inline closure, it might be simpler if you just cloned the 
builder. Something like
   
   ```rust
   let builder = MetricBuilder::new(metrics).with_label(filename_label.clone());
   ```
   
   And then 
   ```rust
           let row_groups_pruned_bloom_filter = builder
               .clone() // <---- clone the builder rather than re-create it?
               .with_type(MetricType::Summary)
               .pruning_metrics("row_groups_pruned_bloom_filter", partition);
   ```



##########
datafusion/physical-expr-common/src/metrics/mod.rs:
##########
@@ -533,19 +536,19 @@ impl Label {
         name: impl Into<Cow<'static, str>>,
         value: impl Into<Cow<'static, str>>,
     ) -> Self {
-        let name = name.into();
-        let value = value.into();
+        let name = LabelValue::from(name.into());

Review Comment:
   this is quite a nice way to avoid any public API changes 👍 



##########
datafusion/physical-expr-common/src/metrics/mod.rs:
##########
@@ -555,6 +558,54 @@ impl Display for Label {
     }
 }
 
+#[derive(Clone, Eq)]
+enum LabelValue {

Review Comment:
   Can we add some comments here explaining the important properties here 
(specifically that clone'ing is very cheap / has no allocations)?



##########
datafusion/physical-expr-common/src/metrics/mod.rs:
##########
@@ -533,19 +536,19 @@ impl Label {
         name: impl Into<Cow<'static, str>>,
         value: impl Into<Cow<'static, str>>,

Review Comment:
   We could potentially avoid the extra Arc allocation below by changing this 
sigature to 
   
   ```rust
   pub fn new(
           name: impl Into<LabelValue>,
           value: impl Into<LabelValue>,
   ```
   
   And then implement `From<'static str> for LabelValue`, `From<Arc<str>> for 
LabelValue` and `From<String> for LabelValue` and `From<Cow<...>> for 
LabelValue`
   
   🤔 
   
   That way the API is the same, but you could create the Arc directly at the 
callsite



##########
datafusion/physical-expr-common/src/metrics/mod.rs:
##########
@@ -555,6 +558,54 @@ impl Display for Label {
     }
 }
 
+#[derive(Clone, Eq)]
+enum LabelValue {
+    Static(&'static str),
+    Shared(Arc<str>),
+}
+
+impl LabelValue {
+    fn as_str(&self) -> &str {
+        match self {
+            Self::Static(value) => value,
+            Self::Shared(value) => value.as_ref(),
+        }
+    }
+}
+
+impl From<Cow<'static, str>> for LabelValue {
+    fn from(value: Cow<'static, str>) -> Self {
+        match value {
+            Cow::Borrowed(value) => Self::Static(value),
+            Cow::Owned(value) => Self::Shared(Arc::from(value)),

Review Comment:
   This line is my only concern -- it requires an extra allocation (for the 
Arc) and deallocation (for the String) now. 
   
   We can probably remove that overhead as a follow on PR eventually.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to