alamb opened a new issue #697:
URL: https://github.com/apache/arrow-rs/issues/697


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   In https://github.com/apache/arrow-datafusion/pull/870, @b41sh added support 
for filtering all values that do/do not match a particular regular expression. 
However, it uses the (only available at time of writing) 
[`regexp_match`](https://docs.rs/arrow/5.2.0/arrow/compute/kernels/regexp/fn.regexp_match.html)
 kernel which returns any actual matches (as a `ListArray`) rather than just a 
"true/false" (`BooleanArray`) if the row matched or not. This is unoptimal 
because:
   1. It is more work to construct a `ListArray` than a `BooleanArray`
   2. There is extra work to then turn the `ListArray` back into a 
`BooleanArray`
   
   **Describe the solution you'd like**
   Add an arrow compute kernel (perhaps in the 
[`comparison`](https://docs.rs/arrow/5.2.0/arrow/compute/kernels/comparison/index.html)
 module)  that looks like 
   
   A better name TBD -- `regexp_matches_utf8` is similar to `like_utf8` but 
also perhaps too similar to `regexp_match`
   
   ```rust
   pub fn regexp_matches_utf8<OffsetSize: StringOffsetSizeTrait>(
       array: &GenericStringArray<OffsetSize>, 
       regex_array: &GenericStringArray<OffsetSize>, 
       flags_array: Option<&GenericStringArray<OffsetSize>>
   ) -> Result<BooleanArray>
   ```
   
   Where the resulting `BooleanArray` is
   * true if there was 1 or more matches of the regex array/flags 
   * false if there were 0 matches of the regex array/flags
   * NULL if the input or regexp array was null (make them the same null 
semantics as `regex_match` and `like_utf8`)
   
   **Describe alternatives you've considered**
   None yet
   
   **Additional context**
   See use in https://github.com/apache/arrow-datafusion/pull/870


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to