edponce commented on a change in pull request #11023:
URL: https://github.com/apache/arrow/pull/11023#discussion_r706358970
##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -2357,6 +2584,79 @@ void AddSplit(FunctionRegistry* registry) {
#endif
}
+template <typename Type1, typename Type2>
+struct StrRepeatTransform : public StringBinaryTransformBase {
+ using ArrayType1 = typename TypeTraits<Type1>::ArrayType;
+ using ArrayType2 = typename TypeTraits<Type2>::ArrayType;
+
+ int64_t MaxCodeunits(int64_t inputs, int64_t input_ncodeunits,
+ const std::shared_ptr<Scalar>& input2) override {
+ auto nrepeats = static_cast<int64_t>(UnboxScalar<Type2>::Unbox(*input2));
+ return std::max(input_ncodeunits * nrepeats, int64_t(0));
+ }
+
+ int64_t MaxCodeunits(int64_t inputs, int64_t input_ncodeunits,
+ const std::shared_ptr<ArrayData>& data2) override {
+ ArrayType2 array2(data2);
+ // Ideally, we would like to calculate the exact output size by iterating
over
+ // all strings offsets and summing each length multiplied by the
corresponding repeat
+ // value, but this requires traversing the data twice (now and during
transform).
+ // The upper limit is to assume that all strings are repeated the max
number of
+ // times knowing that a resize operation is performed at end of execution.
+ auto max_nrepeats =
+ static_cast<int64_t>(**std::max_element(array2.begin(), array2.end()));
+ return std::max(input_ncodeunits * max_nrepeats, int64_t(0));
+ }
+
+ int64_t Transform(const uint8_t* input, int64_t input_string_ncodeunits,
+ const std::shared_ptr<Scalar>& input2, uint8_t* output) {
+ auto nrepeats = static_cast<int64_t>(UnboxScalar<Type2>::Unbox(*input2));
+ uint8_t* output_start = output;
+ if (nrepeats > 0) {
+ // log2(k) approach
Review comment:
Ok, will fix the complexity to `l * log2(k)` where `l` is the length of
the input string.
In terms of performance, based on isolated benchmarks I performed comparing
several *copy* approaches, the log2 approach is faster for all cases where
`nrepeats >= 4`, and for `nrepeats < 4` it was not reasonably slower than
direct copies. In my initial PR, I had an `if-else` to handle this, but thought
that having the condition check for all values, in addition, to having two
approaches, was not better.
This circles back to some of my previous comments/ideas, that the Exec
methods should provide a mechanism for selecting kernel `Transform/Call`
variants based on these higher-level options. More on this very soon.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]