lidavidm commented on a change in pull request #11233:
URL: https://github.com/apache/arrow/pull/11233#discussion_r720699208
##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -1132,52 +1132,51 @@ void AddMatchSubstring(FunctionRegistry* registry) {
{
auto func = std::make_shared<ScalarFunction>("match_substring",
Arity::Unary(),
&match_substring_doc);
- auto exec_32 = MatchSubstring<StringType, PlainSubstringMatcher>::Exec;
- auto exec_64 = MatchSubstring<LargeStringType,
PlainSubstringMatcher>::Exec;
- DCHECK_OK(func->AddKernel({utf8()}, boolean(), exec_32,
MatchSubstringState::Init));
- DCHECK_OK(
- func->AddKernel({large_utf8()}, boolean(), exec_64,
MatchSubstringState::Init));
+ for (const auto& ty : BaseBinaryTypes()) {
+ auto exec =
+ GenerateTypeAgnosticVarBinaryBase<MatchSubstring,
PlainSubstringMatcher>(ty);
+ DCHECK_OK(func->AddKernel({ty}, boolean(), exec,
MatchSubstringState::Init));
+ }
DCHECK_OK(registry->AddFunction(std::move(func)));
}
{
auto func = std::make_shared<ScalarFunction>("starts_with", Arity::Unary(),
&match_substring_doc);
- auto exec_32 = MatchSubstring<StringType, PlainStartsWithMatcher>::Exec;
- auto exec_64 = MatchSubstring<LargeStringType,
PlainStartsWithMatcher>::Exec;
- DCHECK_OK(func->AddKernel({utf8()}, boolean(), exec_32,
MatchSubstringState::Init));
- DCHECK_OK(
- func->AddKernel({large_utf8()}, boolean(), exec_64,
MatchSubstringState::Init));
+ for (const auto& ty : BaseBinaryTypes()) {
+ auto exec =
+ GenerateTypeAgnosticVarBinaryBase<MatchSubstring,
PlainStartsWithMatcher>(ty);
+ DCHECK_OK(func->AddKernel({ty}, boolean(), exec,
MatchSubstringState::Init));
+ }
Review comment:
Thanks.
Just to be clear: std::string can represent binary data; it can hold nulls
without issue. Internally, it's a pointer and a length. However, you can run
into null-termination issues if you ever use a constructor that is only a
pointer without a length. I suspect that's what's happening somewhere in our
stack. The `length` and `size` methods aren't calling `strlen` on the fly,
they're just returning the stored length.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]