rluvaton opened a new pull request, #9711:
URL: https://github.com/apache/arrow-rs/pull/9711

   # Which issue does this PR close?
   
   N/A
   
   # Rationale for this change
   
   In variable-length array types (e.g., `StringArray`, `ListArray`), null 
entries may have non-empty offset ranges, meaning the underlying data buffer 
contains data behind nulls. This matters when wanting to work on the underlying 
values of variable length data for example when unwrapping (flattening) a list 
array, as the child values are exposed, including those behind null entries. If 
null entries point to non-empty ranges, the unwrapped values will contain data 
that may not be
   meaningful to operate on and could cause errors (e.g., division by zero in 
the child values).
   
   
   Usages when this will be helpful:
   - flattening list array
   - casting lists/map - we don't wanna cast values that are not used so this 
is a check if there is one
   - explode on list - we don't want the null values behind it so this give us 
a check if it exists (will have another pr to cleanup empty values)
   - gc on lists/map/strings to remove unneeded data
   
   # What changes are included in this PR?
   Add `OffsetBuffer::is_there_null_pointing_to_non_empty_value` method that 
checks if any null positions correspond to non-empty offset
   ranges
   
   # Are these changes tested?
   
   Yes
   
   # Are there any user-facing changes?
   
   Yes, a new public method 
`OffsetBuffer::is_there_null_pointing_to_non_empty_value` is added.
   
   
   -------
   
   
   Related to:
   - https://github.com/apache/datafusion/pull/18921 as it need to unwrap the 
list values and only get the reachable values 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to