rluvaton opened a new pull request, #9970:
URL: https://github.com/apache/arrow-rs/pull/9970
# Which issue does this PR close?
N/A
# Rationale for this change
when working with lists or variable size arrays you cant operate on the
underlying values/bytes of variable length array as is as nulls might point to
non empty values
Cases when this is useful:
1. lambda function on lists - since we need to remove the values that
are not null
2. `explode` sql function - list values behind nulls cannot be kept
3. have kernels that use the list values without need to check if the
value should be processed or not - for example implementing
`array_distinct` which is keeping in each list the unique items
# What changes are included in this PR?
added to `arrow-select` `cleanup_non_empty_nulls` module which include 2
functions
1. `cleanup_non_empty_nulls` which is the logic for removing non empty nulls
values
2. `has_non_empty_nulls` which can be called before calling the
`cleanup_non_empty_nulls` function to check if the expensive work is even needed
3. Added benchmarks for cleanup
Originally I wanted to add the function on `ListArray` and `StringArray` and
so on, but because the use of take and interleave we cannot do that
# Are these changes tested?
Yes
# Are there any user-facing changes?
yes, new kernel
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]