dylanhz commented on code in PR #28189:
URL: https://github.com/apache/flink/pull/28189#discussion_r3266180383
##########
flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/functions/SqlFunctionUtils.java:
##########
@@ -422,21 +422,20 @@ public static String splitIndex(String str, int
character, int index) {
/**
* Returns a string resulting from replacing all substrings that match the
regular expression
- * with replacement.
+ * with replacement. Literal regexes are validated at planning time by the
input type strategy.
*/
public static String regexpReplace(String str, String regex, String
replacement) {
if (str == null || regex == null || replacement == null) {
return null;
}
try {
- return str.replaceAll(regex,
Matcher.quoteReplacement(replacement));
- } catch (Exception e) {
- LOG.error(
- String.format(
- "Exception in regexpReplace('%s', '%s', '%s')",
- str, regex, replacement),
- e);
- // return null if exception in regex replace
+ return REGEXP_PATTERN_CACHE
Review Comment:
> not sure we need it
>
> instead of this better to make it reuse on code gen level then no need to
even enter/invoke same method second time
I’d lean towards handling this separately, ideally with a benchmark.
This PR already removes the more expensive per-record `Pattern.compile`.
Skipping `REGEXP_PATTERN_CACHE.get(...)` for literal regexps seems like a
narrower optimization, and may add some complexity to the function/codegen
path. If we pursue it, it would be better to apply it consistently across the
regexp family in a follow-up PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]