scovich commented on issue #6528:
URL: https://github.com/apache/arrow-rs/issues/6528#issuecomment-2657044367

   FWIW, this is the variant I ended up using, since I needed the null mask 
unions to propagate recursively through the whole struct:
   ```rust
   /// Splits a StructArray into its parts, unions in the parent null mask, and 
uses the result to
   /// recursively update the children as well before putting everything back 
together.
   fn compute_nested_null_masks(sa: StructArray, parent_nulls: 
Option<&NullBuffer>) -> StructArray {
       let (fields, columns, nulls) = sa.into_parts();
       let nulls = NullBuffer::union(parent_nulls, nulls.as_ref());
       let columns = columns
           .into_iter()
           .map(|column| match column.as_struct_opt() {
               Some(sa) => Arc::new(compute_nested_null_masks(sa.clone(), 
nulls.as_ref())) as _,
               None => {
                   let data = column.to_data();
                   let nulls = NullBuffer::union(nulls.as_ref(), data.nulls());
                   let builder = data.into_builder().nulls(nulls);
                   // Use an unchecked build to avoid paying a redundant O(k) 
validation cost for a
                   // `RecordBatch` with k leaf columns.
                   //
                   // SAFETY: The builder was constructed from an `ArrayData` 
we extracted from the
                   // column. The change we make is the null buffer, via 
`NullBuffer::union` with input
                   // null buffers that were _also_ extracted from the column 
and its parent. A union
                   // can only _grow_ the set of NULL rows, so data validity is 
preserved. Even if the
                   // `parent_nulls` somehow had a length mismatch --- which it 
never should, having
                   // also been extracted from our grandparent --- the mismatch 
would have already
                   // caused `NullBuffer::union` to panic.
                   let data = unsafe { builder.build_unchecked() };
                   make_array(data)
               }
           })
           .collect();
   
       // Use an unchecked constructor to avoid paying O(n*k) a redundant null 
buffer validation cost
       // for a `RecordBatch` with n rows and k leaf columns.
       //
       // SAFETY: We are simply reassembling the input `StructArray` we 
previously broke apart, with
       // updated null buffers. See above for details about null buffer safety.
       unsafe { StructArray::new_unchecked(fields, columns, nulls) }
   ```
   NOTE: A side advantage of the above approach is that the builder has fully 
recursive checking while the struct array only validates that its (presumably 
already correct) children fit together nicely. So if the above were changed to 
use fallible `ArrayDataBuilder::build` and `StructArray::new`, the overhead 
would be relatively low because the builder is only used at leaf level.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to