Ramin Gharib created FLINK-39649:
------------------------------------

             Summary: REGEXP_EXTRACT plan-time validation and hot-path log 
cleanup
                 Key: FLINK-39649
                 URL: https://issues.apache.org/jira/browse/FLINK-39649
             Project: Flink
          Issue Type: Sub-task
          Components: Table SQL / API, Table SQL / Planner, Table SQL / Runtime
            Reporter: Ramin Gharib


SqlFunctionUtils.regexpExtract compiles the regex per record and emits 
LOG.error on PatternSyntaxException. The pattern is known at planning time when 
it is a string literal.
h3. Reproducer

 
{code:java}
  SELECT REGEXP_EXTRACT(payload, '(', 1) FROM src; {code}
 

'(' is an unbalanced group. The job plans successfully and the runtime emits 
one stack trace per record processed.
h3.        
Fix         
 # Add RegexpExtractInputTypeStrategy. Compiles literal regex during 
inferInputTypes, fails via callContext.fail(...).
 # Route BuiltInFunctionDefinitions.REGEXP_EXTRACT through it
 # Update SqlFunctionUtils.regexpExtract to use REGEXP_PATTERN_CACHE and 
silently return null on compile failure. No LOG.error on the hot path. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to