neilconway commented on code in PR #22361:
URL: https://github.com/apache/datafusion/pull/22361#discussion_r3268232375


##########
datafusion/functions/src/regex/regexplike.rs:
##########
@@ -121,6 +121,13 @@ impl ScalarUDFImpl for RegexpLikeFunc {
         })
     }
 
+    /// Some literal patterns can make regex compilation too expensive during
+    /// planning. Skip constant folding so the cost is paid at execution time.
+    /// See https://github.com/apache/datafusion/issues/22185.

Review Comment:
   Do other regexp functions/operations exhibit similar properties?



##########
datafusion/core/tests/optimizer/mod.rs:
##########
@@ -136,6 +136,37 @@ fn concat_ws_literals() -> Result<()> {
     Ok(())
 }
 
+/// Regression test for https://github.com/apache/datafusion/issues/22185.
+///
+/// `regexp_like('aaaaa', 'a{5}{5}{5}{5}{5}{5}{5}{5}')` used to stall planning
+/// while `regex::Regex::new` walked the crate's `size_limit`. The optimizer
+/// should leave the expression intact and avoid that work during planning.
+#[test]
+fn regexp_like_pathological_pattern_does_not_stall_planning() {
+    let pattern = format!("a{}", "{5}".repeat(8));
+    let sql = format!("SELECT regexp_like('aaaaa', '{pattern}') AS m");
+
+    let t0 = datafusion_common::instant::Instant::now();
+    let plan = test_sql(&sql).expect("optimize should succeed");
+    let elapsed = t0.elapsed();
+
+    // 100ms is comfortably above the post-fix wall time (a few ms even on
+    // CI) and well below the >1s DoS reported on the issue.
+    assert!(
+        elapsed < std::time::Duration::from_millis(100),
+        "planning took {elapsed:?} (expected < 100ms)"
+    );

Review Comment:
   I think unit tests that depend on wallclock time should be avoided if at all 
possible.
   
   In this case, I think it is fine to just test the goal state (e.g., we don't 
evaluate the expression at planning-time), not the actual observed behavior 
(planning is slow).



##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -677,6 +682,18 @@ impl ConstEvaluator {
                 }
                 true
             }
+            // Regex operators compile the right-hand pattern during
+            // evaluation. Avoid constant folding for literal patterns so
+            // planning does not pay that cost. See
+            // https://github.com/apache/datafusion/issues/22185.
+            Expr::BinaryExpr(BinaryExpr {
+                op:
+                    Operator::RegexMatch
+                    | Operator::RegexIMatch
+                    | Operator::RegexNotMatch
+                    | Operator::RegexNotIMatch,
+                ..
+            }) => false,

Review Comment:
   Seems like this change is not tested? And should also be mentioned more 
prominently in the PR description.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to