Ramin Gharib created FLINK-39650:
------------------------------------

             Summary: REGEXP_REPLACE does not validate literal regex patterns 
at plan time, recompiles per record, and logs errors on the hot path
                 Key: FLINK-39650
                 URL: https://issues.apache.org/jira/browse/FLINK-39650
             Project: Flink
          Issue Type: Sub-task
            Reporter: Ramin Gharib


{\{SqlFunctionUtils.regexpReplace}} at 
\{{flink-table-runtime/.../SqlFunctionUtils.java:426}}:

{code:java}
public static String regexpReplace(String str, String regex, String 
replacement) {
    ...
    try {
        return str.replaceAll(regex, Matcher.quoteReplacement(replacement));
    } catch (Exception e) {
        LOG.error(
                String.format(
                        "Exception in regexpReplace('%s', '%s', '%s')",
                        str, regex, replacement),
                e);
        return null;
    }
}
{code}

{\{String.replaceAll}} calls \{{Pattern.compile(regex)}} internally on every 
invocation. Two problems on the hot path:

* Pattern is recompiled per record even when it never changes.
* \{{PatternSyntaxException}} is caught and logged at \{{ERROR}} per record.

h2. Reproducer

{code:sql}
SELECT REGEXP_REPLACE(payload, '(', 'X') FROM src;
{code}

h2. Fix

# Add \{{RegexpReplaceInputTypeStrategy}}. Same shape as the 
\{{REGEXP_EXTRACT}} strategy.
# Route {{BuiltInFunctionDefinitions.REGEXP_REP



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to