felipecrv commented on code in PR #41827:
URL: https://github.com/apache/arrow/pull/41827#discussion_r1623498992
##########
cpp/src/arrow/compute/kernels/scalar_cast_string.cc:
##########
@@ -510,6 +510,60 @@ void AddBinaryToFixedSizeBinaryCast(CastFunction* func) {
AddBinaryToFixedSizeBinaryCast<FixedSizeBinaryType>(func);
}
+// ----------------------------------------------------------------------
+// Union to String
+
+template <typename O>
+struct UnionToStringCastFunctor {
+ using BuilderType = typename TypeTraits<O>::BuilderType;
+
+ static Status Exec(KernelContext* ctx, const ExecSpan& batch, ExecResult*
out) {
+ const ArraySpan& input = batch[0].array;
+ const auto& union_type = checked_cast<const UnionType&>(*input.type);
+ const auto type_ids = input.GetValues<int8_t>(1);
+ const auto& offsets = input.GetValues<int32_t>(2);
+
+ BuilderType builder(input.type->GetSharedPtr(), ctx->memory_pool());
+ RETURN_NOT_OK(builder.Reserve(input.length));
+
+ for (int64_t i = 0; i < input.length; ++i) {
Review Comment:
> In this way, we can unify it with other type's implementations and shield
the logic of converting strings in the current file.
True, but that can prevent optimizations in the future. The approach of
taking a scalar function and turning it into an array function by mapping
—`array::map(scalar_function: scalar -> scalar) -> array` — is appealing but
prevents vectorization techniques.
UPDATE: that's what we will do here because the set of unions and their
parametrizations is infinite, but `StringFormatter<MonthIntervalType>` is not
the way to go because it would have to `switch` on the type for every
invocation of the formatter.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]