alamb commented on code in PR #17248:
URL: https://github.com/apache/datafusion/pull/17248#discussion_r2294465105


##########
datafusion/expr/src/expr.rs:
##########
@@ -599,6 +599,65 @@ impl From<&HashMap<String, String>> for FieldMetadata {
     }
 }
 
+/// The metadata used in [`Field::metadata`].
+///
+/// This represents the metadata associated with an Arrow [`Field`]. The 
metadata consists of key-value pairs.
+///
+/// # Common Use Cases
+///
+/// Field metadata is commonly used to store:
+/// - Default values for columns when data is missing
+/// - Column descriptions or documentation
+/// - Data lineage information
+/// - Custom application-specific annotations
+/// - Encoding hints or display formatting preferences
+///
+/// # Example: Storing Default Values
+///
+/// A practical example of using field metadata is storing default values for 
columns
+/// that may be missing in the physical data but present in the logical schema.
+/// See the [default_column_values.rs] example implementation.
+///
+/// [default_column_values.rs]: 
https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/default_column_values.rs
+pub type SchemaFieldMetadata = std::collections::HashMap<String, String>;
+
+/// Intersects multiple metadata instances for UNION operations.
+///
+/// This function implements the intersection strategy used by UNION 
operations,
+/// where only metadata keys that exist in ALL inputs with identical values
+/// are preserved in the result.
+///
+/// # Union Metadata Behavior
+///
+/// Union operations require consistent metadata across all branches:
+/// - Only metadata keys present in ALL union branches are kept
+/// - For each kept key, the value must be identical across all branches
+/// - If a key has different values across branches, it is excluded from the 
result
+/// - If any input has no metadata, the result will be empty
+///
+/// # Arguments
+///
+/// * `metadatas` - An iterator of `SchemaFieldMetadata` instances to intersect
+///
+/// # Returns
+///
+/// A new `SchemaFieldMetadata` containing only the intersected metadata
+pub fn intersect_for_union<'a>(

Review Comment:
   give this operates on metadata, I think we should have metadata in the name



##########
datafusion/expr/src/expr.rs:
##########
@@ -599,6 +599,46 @@ impl From<&HashMap<String, String>> for FieldMetadata {
     }
 }
 
+/// The metadata used in [`Field::metadata`].

Review Comment:
   As we move to working with more field metadata, maybe should move this 
function into its own module -- `datafusion/expr/src/metadata.rs` or something 
🤔 
   
   As a follow on PR



##########
datafusion/expr/src/logical_plan/plan.rs:
##########
@@ -2896,16 +2899,7 @@ impl Union {
 fn intersect_maps<'a>(
     inputs: impl IntoIterator<Item = &'a HashMap<String, String>>,
 ) -> HashMap<String, String> {
-    let mut inputs = inputs.into_iter();
-    let mut merged: HashMap<String, String> = 
inputs.next().cloned().unwrap_or_default();
-    for input in inputs {
-        // The extra dereference below (`&*v`) is a workaround for 
https://github.com/rkyv/rkyv/issues/434.
-        // When this crate is used in a workspace that enables the `rkyv-64` 
feature in the `chrono` crate,
-        // this triggers a Rust compilation error:
-        // error[E0277]: can't compare `Option<&std::string::String>` with 
`Option<&mut std::string::String>`.
-        merged.retain(|k, v| input.get(k) == Some(&*v));
-    }
-    merged
+    intersect_for_union(inputs)

Review Comment:
   since this function just calls another one, I think we could remove this one 
and call the other



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to