[
https://issues.apache.org/jira/browse/FLINK-39651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ramin Gharib reassigned FLINK-39651:
------------------------------------
Assignee: Ramin Gharib
> REGEXP predicate does not validate literal regex patterns at plan time and
> logs errors on the hot path
> ------------------------------------------------------------------------------------------------------
>
> Key: FLINK-39651
> URL: https://issues.apache.org/jira/browse/FLINK-39651
> Project: Flink
> Issue Type: Sub-task
> Components: Table SQL / API, Table SQL / Planner, Table SQL / Runtime
> Reporter: Ramin Gharib
> Assignee: Ramin Gharib
> Priority: Major
>
> {\{SqlFunctionUtils.regExp}} at
> \{{flink-table-runtime/.../SqlFunctionUtils.java:1017}}:
> {code:java}
> public static Boolean regExp(String s, String regex) {
> if (regex.length() == 0) {
> return false;
> }
> try {
> return (REGEXP_PATTERN_CACHE.get(regex)).matcher(s).find(0);
> } catch (Exception e) {
> LOG.error("Exception when compile and match regex:" + regex + " on: "
> + s, e);
> return false;
> }
> }
> {code}
> Cached compilation is already in place. The remaining problem is the
> \{{LOG.error}} on the hot path. A bad literal regex still produces one stack
> trace per record.
> h2. Reproducer
> {code:sql}
> SELECT * FROM src WHERE payload REGEXP '(';
> {code}
> h2. Fix
> # Add \{{RegexpPredicateInputTypeStrategy}}. Same shape as the
> \{{REGEXP_EXTRACT}} strategy.
> # Route \{{BuiltInFunctionDefinitions.REGEXP}} through it.
> # Drop the \{{LOG.error}} and silently return \{{false}} on
> \{{PatternSyntaxException}}. No \{{LOG.error}} on the hot path.
> h2. Tests
> * \{{RegexpPredicateInputTypeStrategyTest}}.
> * Regression coverage in the predicate IT case (\{{ScalarFunctionsTest}} or
> equivalent).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)