crepererum commented on code in PR #4646: URL: https://github.com/apache/arrow-datafusion/pull/4646#discussion_r1059877878
########## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ########## @@ -1576,17 +1589,166 @@ mod tests { assert_eq!(simplify(expr), expected) } + #[test] + fn test_simplify_regex() { + // malformed regex + assert_contains!( + try_simplify(regex_match(col("c1"), lit("foo{"))) + .unwrap_err() + .to_string(), + "regex parse error" + ); + + // unsupported cases + assert_no_change(regex_match(col("c1"), lit("foo.*"))); + assert_no_change(regex_match(col("c1"), lit("(foo)"))); + assert_no_change(regex_match(col("c1"), lit("^foo"))); + assert_no_change(regex_match(col("c1"), lit("foo$"))); + assert_no_change(regex_match(col("c1"), lit("%"))); + assert_no_change(regex_match(col("c1"), lit("_"))); + assert_no_change(regex_match(col("c1"), lit("f%o"))); + assert_no_change(regex_match(col("c1"), lit("f_o"))); + + // empty cases + assert_change(regex_match(col("c1"), lit("")), like(col("c1"), "%")); + assert_change( + regex_not_match(col("c1"), lit("")), + not_like(col("c1"), "%"), + ); + assert_change(regex_imatch(col("c1"), lit("")), ilike(col("c1"), "%")); + assert_change( + regex_not_imatch(col("c1"), lit("")), + not_ilike(col("c1"), "%"), + ); + + // single character + assert_change(regex_match(col("c1"), lit("x")), like(col("c1"), "%x%")); + + // single word + assert_change(regex_match(col("c1"), lit("foo")), like(col("c1"), "%foo%")); + + // OR-chain + assert_change( + regex_match(col("c1"), lit("foo|bar|baz")), Review Comment: We can extend the rewrite logic at one point to spot fully anchored regex expressions (i.e. the ones that span the whole string, not any substring). In this case we could emit EQ/NEQ/IN. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org