Ramin Gharib created FLINK-39651:
------------------------------------

             Summary: REGEXP predicate does not validate literal regex patterns 
at plan time and logs errors on the hot path
                 Key: FLINK-39651
                 URL: https://issues.apache.org/jira/browse/FLINK-39651
             Project: Flink
          Issue Type: Sub-task
          Components: Table SQL / API, Table SQL / Planner, Table SQL / Runtime
            Reporter: Ramin Gharib


{\{SqlFunctionUtils.regExp}} at 
\{{flink-table-runtime/.../SqlFunctionUtils.java:1017}}:

{code:java}
public static Boolean regExp(String s, String regex) {
    if (regex.length() == 0) {
        return false;
    }
    try {
        return (REGEXP_PATTERN_CACHE.get(regex)).matcher(s).find(0);
    } catch (Exception e) {
        LOG.error("Exception when compile and match regex:" + regex + " on: " + 
s, e);
        return false;
    }
}
{code}

Cached compilation is already in place. The remaining problem is the 
\{{LOG.error}} on the hot path. A bad literal regex still produces one stack 
trace per record.

h2. Reproducer

{code:sql}
SELECT * FROM src WHERE payload REGEXP '(';
{code}

h2. Fix

# Add \{{RegexpPredicateInputTypeStrategy}}. Same shape as the 
\{{REGEXP_EXTRACT}} strategy.
# Route \{{BuiltInFunctionDefinitions.REGEXP}} through it.
# Drop the \{{LOG.error}} and silently return \{{false}} on 
\{{PatternSyntaxException}}. No \{{LOG.error}} on the hot path.

h2. Tests

* \{{RegexpPredicateInputTypeStrategyTest}}.
* Regression coverage in the predicate IT case (\{{ScalarFunctionsTest}} or 
equivalent).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to