alamb commented on code in PR #17195:
URL: https://github.com/apache/datafusion/pull/17195#discussion_r2280428090


##########
datafusion/functions/src/regex/regexpreplace.rs:
##########
@@ -94,14 +98,30 @@ impl Default for RegexpReplaceFunc {
 
 impl RegexpReplaceFunc {
     pub fn new() -> Self {
-        use DataType::*;
+        use TypeSignature::*;
+        use TypeSignatureClass::*;
         Self {
             signature: Signature::one_of(
                 vec![
-                    TypeSignature::Exact(vec![Utf8, Utf8, Utf8]),

Review Comment:
   I personally recommend supporting only "all UTF8" , "all largeUtf8" and "all 
Utf8View"
   
   Supporting all combinations I think is likely to generate lots of code that 
will make the binary bigger but not be used by anyone



##########
datafusion/functions/src/regex/regexpreplace.rs:
##########
@@ -590,44 +638,1674 @@ pub fn specialize_regexp_replace<T: OffsetSizeTrait>(
                 .map(|arg| arg.to_array(inferred_length))
                 .collect::<Result<Vec<_>>>()?;
 
-            match args[0].data_type() {
-                DataType::Utf8View => {
-                    let string_array = args[0].as_string_view();
-                    let pattern_array = args[1].as_string::<i32>();
-                    let replacement_array = args[2].as_string::<i32>();
-                    regexp_replace::<i32, _, _>(
-                        string_array,
-                        pattern_array,
-                        replacement_array,
-                        args.get(3),
-                    )
-                }
-                DataType::Utf8 => {
-                    let string_array = args[0].as_string::<i32>();
-                    let pattern_array = args[1].as_string::<i32>();
-                    let replacement_array = args[2].as_string::<i32>();
-                    regexp_replace::<i32, _, _>(
-                        string_array,
-                        pattern_array,
-                        replacement_array,
-                        args.get(3),
-                    )
-                }
-                DataType::LargeUtf8 => {
-                    let string_array = args[0].as_string::<i64>();
-                    let pattern_array = args[1].as_string::<i64>();
-                    let replacement_array = args[2].as_string::<i64>();
-                    regexp_replace::<i64, _, _>(
-                        string_array,
-                        pattern_array,
-                        replacement_array,
-                        args.get(3),
-                    )
+            if args.get(3).is_none() {

Review Comment:
   Each of the branches in this function generates its own copy of the function 
(so this is generating many many different function copies
   
   I think it makes sense to create versions where each of the parameters is 
the same, like
   
   ```
   (DataType::Utf8View, DataType::Utf8View, DataType::Utf8View)
   (DataType::Utf8, DataType::Utf8, DataType::Utf8)
   (DataType::LargeUtf8, DataType::LargeUtf8, DataType::LargeUtf8)
   ```
   
   I don't know how often anyone would need the other combinations 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to