[ 
https://issues.apache.org/jira/browse/FLINK-39651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramin Gharib reassigned FLINK-39651:
------------------------------------

    Assignee: Ramin Gharib

> REGEXP predicate does not validate literal regex patterns at plan time and 
> logs errors on the hot path
> ------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-39651
>                 URL: https://issues.apache.org/jira/browse/FLINK-39651
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table SQL / API, Table SQL / Planner, Table SQL / Runtime
>            Reporter: Ramin Gharib
>            Assignee: Ramin Gharib
>            Priority: Major
>
> {\{SqlFunctionUtils.regExp}} at 
> \{{flink-table-runtime/.../SqlFunctionUtils.java:1017}}:
> {code:java}
> public static Boolean regExp(String s, String regex) {
>     if (regex.length() == 0) {
>         return false;
>     }
>     try {
>         return (REGEXP_PATTERN_CACHE.get(regex)).matcher(s).find(0);
>     } catch (Exception e) {
>         LOG.error("Exception when compile and match regex:" + regex + " on: " 
> + s, e);
>         return false;
>     }
> }
> {code}
> Cached compilation is already in place. The remaining problem is the 
> \{{LOG.error}} on the hot path. A bad literal regex still produces one stack 
> trace per record.
> h2. Reproducer
> {code:sql}
> SELECT * FROM src WHERE payload REGEXP '(';
> {code}
> h2. Fix
> # Add \{{RegexpPredicateInputTypeStrategy}}. Same shape as the 
> \{{REGEXP_EXTRACT}} strategy.
> # Route \{{BuiltInFunctionDefinitions.REGEXP}} through it.
> # Drop the \{{LOG.error}} and silently return \{{false}} on 
> \{{PatternSyntaxException}}. No \{{LOG.error}} on the hot path.
> h2. Tests
> * \{{RegexpPredicateInputTypeStrategyTest}}.
> * Regression coverage in the predicate IT case (\{{ScalarFunctionsTest}} or 
> equivalent).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to