Jefffrey commented on code in PR #6758:
URL: https://github.com/apache/arrow-rs/pull/6758#discussion_r1903204581


##########
arrow-array/src/record_batch.rs:
##########
@@ -394,6 +396,104 @@ impl RecordBatch {
         )
     }
 
+    /// Normalize a semi-structured [`RecordBatch`] into a flat table.
+    ///
+    /// `separator`: Nested [`Field`]s will generate names separated by 
`separator`, e.g. for
+    /// separator= "." and the schema:
+    /// ```text
+    ///     "foo": StructArray<"bar": Utf8>
+    /// ```
+    /// will generate:
+    /// ```text
+    ///     "foo.bar": Utf8
+    /// ```
+    /// `max_level`: The maximum number of levels (depth of the `Schema` and 
`Columns`) to
+    /// normalize. If `0`, normalizes all levels.
+    ///
+    /// # Example
+    ///
+    /// ```
+    /// # use std::sync::Arc;
+    /// # use arrow_array::{ArrayRef, Int64Array, StringArray, StructArray, 
RecordBatch};
+    /// # use arrow_schema::{DataType, Field, Fields, Schema};
+    ///
+    /// let animals: ArrayRef = Arc::new(StringArray::from(vec!["Parrot", 
""]));
+    /// let n_legs: ArrayRef = Arc::new(Int64Array::from(vec![Some(2), 
Some(4)]));
+    ///
+    /// let animals_field = Arc::new(Field::new("animals", DataType::Utf8, 
true));
+    /// let n_legs_field = Arc::new(Field::new("n_legs", DataType::Int64, 
true));
+    ///
+    /// let a = Arc::new(StructArray::from(vec![
+    ///     (animals_field.clone(), Arc::new(animals.clone()) as ArrayRef),
+    ///     (n_legs_field.clone(), Arc::new(n_legs.clone()) as ArrayRef),
+    /// ]));
+    ///
+    /// let schema = Schema::new(vec![
+    ///     Field::new(
+    ///         "a",
+    ///         DataType::Struct(Fields::from(vec![animals_field, 
n_legs_field])),
+    ///         false,
+    ///     )
+    /// ]);
+    ///
+    /// let normalized = RecordBatch::try_new(Arc::new(schema), vec![a])
+    ///     .expect("valid conversion")
+    ///     .normalize(".", 0)
+    ///     .expect("valid normalization");
+    ///
+    /// let expected = RecordBatch::try_from_iter_with_nullable(vec![
+    ///     ("a.animals", animals.clone(), true),
+    ///     ("a.n_legs", n_legs.clone(), true),
+    /// ])
+    /// .expect("valid conversion");
+    ///
+    /// assert_eq!(expected, normalized);
+    /// ```
+    pub fn normalize(&self, separator: &str, mut max_level: usize) -> 
Result<Self, ArrowError> {
+        if max_level == 0 {
+            max_level = usize::MAX;
+        }

Review Comment:
   > the case of Some(0) feels weird to me, and would mean you're doing an 
annoying copy for no reason (because of that, I would want to add in an if 
statement to catch it, but then we end up in the same place).
   
   Personally I find this okay. I'm less concerned with requiring an if check 
inside the code (its pretty simple anyway) compared to presenting a more 
Rust-like interface to users.
   
   > but unfortunately the same solution can't be echoed over to `Schema` 
without an additional dependency, not sure how I feel about that.
   
   I don't follow this, the `Schema` code looks almost identical to the 
`RecordBatch` version.
   
   I agree with `NonZeroUsize` potentially being a bit clunky for users to use 
(personally wasn't aware this was part of the stdlib either).
   
   But yeah I'm curious to see what others might think for this too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to