rluvaton opened a new pull request, #9970:
URL: https://github.com/apache/arrow-rs/pull/9970

   # Which issue does this PR close?
   
   N/A
   
   # Rationale for this change
   when working with lists or variable size arrays you cant operate on the 
underlying values/bytes of variable length array as is as nulls might point to 
non empty values
   
   Cases when this is useful:
   1. lambda function on lists - since we need to remove the values that
   are not null
   2. `explode` sql function - list values behind nulls cannot be kept
   3. have kernels that use the list values without need to check if the
      value should be processed or not - for example implementing
      `array_distinct` which is keeping in each list the unique items
   
   # What changes are included in this PR?
   
   added to `arrow-select`  `cleanup_non_empty_nulls` module which include 2 
functions
   1. `cleanup_non_empty_nulls` which is the logic for removing non empty nulls 
values
   2. `has_non_empty_nulls` which can be called before calling the 
`cleanup_non_empty_nulls` function to check if the expensive work is even needed
   3. Added benchmarks for cleanup
   
   Originally I wanted to add the function on `ListArray` and `StringArray` and 
so on, but because the use of take and interleave we cannot do that
    
   # Are these changes tested?
   
   Yes
   
   # Are there any user-facing changes?
   
   yes, new kernel


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to