alamb commented on issue #8051: URL: https://github.com/apache/arrow-datafusion/issues/8051#issuecomment-1979747257
BTW now that @jayzhan211 and I have implemented `ScalarUDF::simplify` in https://github.com/apache/arrow-datafusion/pull/9298 and we have ported the regular_expression functions to use `ScalarUDF`, I think we could actually use that API to implement precompiled functions Note sure if that would meet your requirements @thinkharderdev For example, to implement "precompiled regexp functions" we could do something like this (would be sweet if someone wanted to prototype this): ```rust /// A new UDF that has a precompiled pattern impl PrecompiledRegexpReplace { precompiled_match: Arc<Pattern> } impl ScalarUDFImpl for PrecompiledRegexpReplace { // invoke function uses `self.precompiled_match` directly ... } // Update the existing RegexpReplace function to implement `simplify` impl ScalarUDFImpl for RegexpReplace { /// if the pattern argument is a scalar, rewrite the function to a new scalar UDF that /// contains a pre-compiled regular-expression fn simplify(&self) .. { match (args[1], args[2]) { (ScalarValue::Utf8(pattern), ScalarValue::Utf8(flags)) => { let pattern = // create regexp match SImplified::Rewritten(ScalarUdf::new(PrecompiledRegexpMatch { precompiled } ))) .call(args) }, _ => Simplified::Original(args) } } ``` We could then run some gnarly regular expression case, such as what is found on https://github.com/apache/arrow-datafusion/issues/8492 and see if it helps or not. If it doesn't help performance, then the extra complexity isn't worth it for regexp_replace -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
