[
https://issues.apache.org/jira/browse/FLINK-39649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-39649:
-----------------------------------
Labels: pull-request-available (was: )
> REGEXP_EXTRACT plan-time validation and hot-path log cleanup
> ------------------------------------------------------------
>
> Key: FLINK-39649
> URL: https://issues.apache.org/jira/browse/FLINK-39649
> Project: Flink
> Issue Type: Sub-task
> Components: Table SQL / API, Table SQL / Planner, Table SQL / Runtime
> Reporter: Ramin Gharib
> Priority: Major
> Labels: pull-request-available
>
> SqlFunctionUtils.regexpExtract compiles the regex per record and emits
> LOG.error on PatternSyntaxException. The pattern is known at planning time
> when it is a string literal.
> h3. Reproducer
>
> {code:java}
> SELECT REGEXP_EXTRACT(payload, '(', 1) FROM src; {code}
>
> '(' is an unbalanced group. The job plans successfully and the runtime emits
> one stack trace per record processed.
> h3.
> Fix
> # Add RegexpExtractInputTypeStrategy. Compiles literal regex during
> inferInputTypes, fails via callContext.fail(...).
> # Route BuiltInFunctionDefinitions.REGEXP_EXTRACT through it
> # Update SqlFunctionUtils.regexpExtract to use REGEXP_PATTERN_CACHE and
> silently return null on compile failure. No LOG.error on the hot path.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)