alamb commented on code in PR #2258:
URL: https://github.com/apache/arrow-rs/pull/2258#discussion_r935892671
##########
arrow/src/compute/kernels/comparison.rs:
##########
@@ -296,6 +298,39 @@ pub fn like_utf8_scalar<OffsetSize: OffsetSizeTrait>(
Ok(BooleanArray::from(data))
}
+fn replace_like_wildcards(text: &str) -> Result<String> {
Review Comment:
It would help me review this code if we can include a docstring on this
function explaining what replacements it was doing.
Like explaining it was trying to escape `%` with `\\%` perhaps
##########
arrow/Cargo.toml:
##########
@@ -49,6 +49,7 @@ half = { version = "2.0", default-features = false }
hashbrown = { version = "0.12", default-features = false }
csv_crate = { version = "1.1", default-features = false, optional = true,
package="csv" }
regex = { version = "1.5.6", default-features = false, features = ["std",
"unicode"] }
+regex-syntax = { version = "0.6.27", default-features = false, features =
["unicode"] }
Review Comment:
I am always worried about new dependencies in arrow but I double checked and
this is already a dependency of `regex` so I don't think it is "net-new" to
arrow. 👍
https://crates.io/crates/regex-syntax/reverse_dependencies
##########
arrow/src/compute/kernels/comparison.rs:
##########
@@ -3740,6 +3775,50 @@ mod tests {
vec![false, true, false, false]
);
+ test_utf8_scalar!(
+ test_utf8_scalar_like_escape,
+ vec!["a%", "a\\x"],
+ "a\\%",
+ like_utf8_scalar,
+ vec![true, false]
+ );
+
+ test_utf8!(
+ test_utf8_scalar_ilike_regex,
+ vec!["%%%"],
+ vec![r#"\%_\%"#],
+ ilike_utf8,
+ vec![true]
+ );
+
+ #[test]
+ fn test_replace_like_wildcards() {
+ let a_eq = "_%";
+ let expected = String::from("..*");
Review Comment:
You can probably do like this for some brevity:
```suggestion
let expected = "..*";
```
And if the assert below gives you issues, maybe it needs something like an
extra `&`:
```rust
assert_eq!(&replace_like_wildcards(a_eq).unwrap(), expected);
```
##########
arrow/src/compute/kernels/comparison.rs:
##########
@@ -342,7 +377,7 @@ pub fn nlike_utf8_scalar<OffsetSize: OffsetSizeTrait>(
result.append(!left.value(i).ends_with(&right[1..]));
}
} else {
- let re_pattern = escape(right).replace('%', ".*").replace('_', ".");
+ let re_pattern = replace_like_wildcards(right)?;
Review Comment:
I really like how you have refactored this code 👍
##########
arrow/src/compute/kernels/comparison.rs:
##########
@@ -3740,6 +3775,50 @@ mod tests {
vec![false, true, false, false]
);
+ test_utf8_scalar!(
Review Comment:
❤️ these test are very nice
##########
arrow/src/compute/kernels/comparison.rs:
##########
@@ -3740,6 +3775,50 @@ mod tests {
vec![false, true, false, false]
);
+ test_utf8_scalar!(
+ test_utf8_scalar_like_escape,
+ vec!["a%", "a\\x"],
+ "a\\%",
+ like_utf8_scalar,
+ vec![true, false]
+ );
+
+ test_utf8!(
+ test_utf8_scalar_ilike_regex,
+ vec!["%%%"],
+ vec![r#"\%_\%"#],
+ ilike_utf8,
+ vec![true]
+ );
+
+ #[test]
+ fn test_replace_like_wildcards() {
+ let a_eq = "_%";
+ let expected = String::from("..*");
+ assert_eq!(replace_like_wildcards(a_eq).unwrap(), expected);
+ }
+
+ #[test]
+ fn test_replace_like_wildcards_leave_like_meta_chars() {
+ let a_eq = "\\%\\_";
+ let expected = String::from("%_");
+ assert_eq!(replace_like_wildcards(a_eq).unwrap(), expected);
+ }
+
+ #[test]
+ fn test_replace_like_wildcards_with_multiple_escape_chars() {
+ let a_eq = "\\\\%";
Review Comment:
I think it may be worth testing `\\\%` (aka unbalanced, three slashes)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]