ngli-me commented on code in PR #6758:
URL: https://github.com/apache/arrow-rs/pull/6758#discussion_r1903200204


##########
arrow-array/src/record_batch.rs:
##########
@@ -394,6 +396,104 @@ impl RecordBatch {
         )
     }
 
+    /// Normalize a semi-structured [`RecordBatch`] into a flat table.
+    ///
+    /// `separator`: Nested [`Field`]s will generate names separated by 
`separator`, e.g. for
+    /// separator= "." and the schema:
+    /// ```text
+    ///     "foo": StructArray<"bar": Utf8>
+    /// ```
+    /// will generate:
+    /// ```text
+    ///     "foo.bar": Utf8
+    /// ```
+    /// `max_level`: The maximum number of levels (depth of the `Schema` and 
`Columns`) to
+    /// normalize. If `0`, normalizes all levels.
+    ///
+    /// # Example
+    ///
+    /// ```
+    /// # use std::sync::Arc;
+    /// # use arrow_array::{ArrayRef, Int64Array, StringArray, StructArray, 
RecordBatch};
+    /// # use arrow_schema::{DataType, Field, Fields, Schema};
+    ///
+    /// let animals: ArrayRef = Arc::new(StringArray::from(vec!["Parrot", 
""]));
+    /// let n_legs: ArrayRef = Arc::new(Int64Array::from(vec![Some(2), 
Some(4)]));
+    ///
+    /// let animals_field = Arc::new(Field::new("animals", DataType::Utf8, 
true));
+    /// let n_legs_field = Arc::new(Field::new("n_legs", DataType::Int64, 
true));
+    ///
+    /// let a = Arc::new(StructArray::from(vec![
+    ///     (animals_field.clone(), Arc::new(animals.clone()) as ArrayRef),
+    ///     (n_legs_field.clone(), Arc::new(n_legs.clone()) as ArrayRef),
+    /// ]));
+    ///
+    /// let schema = Schema::new(vec![
+    ///     Field::new(
+    ///         "a",
+    ///         DataType::Struct(Fields::from(vec![animals_field, 
n_legs_field])),
+    ///         false,
+    ///     )
+    /// ]);
+    ///
+    /// let normalized = RecordBatch::try_new(Arc::new(schema), vec![a])
+    ///     .expect("valid conversion")
+    ///     .normalize(".", 0)
+    ///     .expect("valid normalization");
+    ///
+    /// let expected = RecordBatch::try_from_iter_with_nullable(vec![
+    ///     ("a.animals", animals.clone(), true),
+    ///     ("a.n_legs", n_legs.clone(), true),
+    /// ])
+    /// .expect("valid conversion");
+    ///
+    /// assert_eq!(expected, normalized);
+    /// ```
+    pub fn normalize(&self, separator: &str, mut max_level: usize) -> 
Result<Self, ArrowError> {
+        if max_level == 0 {
+            max_level = usize::MAX;
+        }

Review Comment:
   Okay I've been working on this a bit, I found a few possible solutions that 
might fit. I think Option might not be the best choice, since personally, the 
case of Some(0) feels weird to me, and would mean you're doing an annoying copy 
for no reason (because of that, I would want to add in an if statement to catch 
it, but then we end up in the same place).
   
   For `RecordBatch`, this seems to fit the Rusty syntax better, but 
unfortunately the same solution can't be echoed over to `Schema` without an 
additional dependency, not sure how I feel about that.
   ``` rust
   max_level.is_zero().then(|| max_level = usize::MAX);
   ```
   
   Another option is to use something like `NonZeroUsize`, which I just learned 
about. My issue with this one is that we'd then be making the normalize call 
more annoying, since you have to instantiate it with something like
   ``` rust
   NonZeroUsize::new(1)
   ```
   This makes the normalize call potentially longer and more annoying, but it 
means there wouldn't be another import.
   
   Any thoughts on these/if you disagree?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to