edponce commented on a change in pull request #11233:
URL: https://github.com/apache/arrow/pull/11233#discussion_r720710853



##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -1132,52 +1132,51 @@ void AddMatchSubstring(FunctionRegistry* registry) {
   {
     auto func = std::make_shared<ScalarFunction>("match_substring", 
Arity::Unary(),
                                                  &match_substring_doc);
-    auto exec_32 = MatchSubstring<StringType, PlainSubstringMatcher>::Exec;
-    auto exec_64 = MatchSubstring<LargeStringType, 
PlainSubstringMatcher>::Exec;
-    DCHECK_OK(func->AddKernel({utf8()}, boolean(), exec_32, 
MatchSubstringState::Init));
-    DCHECK_OK(
-        func->AddKernel({large_utf8()}, boolean(), exec_64, 
MatchSubstringState::Init));
+    for (const auto& ty : BaseBinaryTypes()) {
+      auto exec =
+          GenerateTypeAgnosticVarBinaryBase<MatchSubstring, 
PlainSubstringMatcher>(ty);
+      DCHECK_OK(func->AddKernel({ty}, boolean(), exec, 
MatchSubstringState::Init));
+    }
     DCHECK_OK(registry->AddFunction(std::move(func)));
   }
   {
     auto func = std::make_shared<ScalarFunction>("starts_with", Arity::Unary(),
                                                  &match_substring_doc);
-    auto exec_32 = MatchSubstring<StringType, PlainStartsWithMatcher>::Exec;
-    auto exec_64 = MatchSubstring<LargeStringType, 
PlainStartsWithMatcher>::Exec;
-    DCHECK_OK(func->AddKernel({utf8()}, boolean(), exec_32, 
MatchSubstringState::Init));
-    DCHECK_OK(
-        func->AddKernel({large_utf8()}, boolean(), exec_64, 
MatchSubstringState::Init));
+    for (const auto& ty : BaseBinaryTypes()) {
+      auto exec =
+          GenerateTypeAgnosticVarBinaryBase<MatchSubstring, 
PlainStartsWithMatcher>(ty);
+      DCHECK_OK(func->AddKernel({ty}, boolean(), exec, 
MatchSubstringState::Init));
+    }

Review comment:
       @lidavidm Thanks for all your feedback and I apologize for any 
confusion. I noticed that I have not been very clear in explaining myself, I 
mixed several concepts together. For instance, when I state that RE does not 
works for binary (even using Latin1) I meant in the case of `ignore_case = 
true` bc that is when RE2 is used in Arrow.
   
   I am going to verify more carefully the `ignore_case` and `length` usage, 
build up more tests, and organize my findings/thoughts.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to