edponce commented on a change in pull request #11023:
URL: https://github.com/apache/arrow/pull/11023#discussion_r706358970



##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -2357,6 +2584,79 @@ void AddSplit(FunctionRegistry* registry) {
 #endif
 }
 
+template <typename Type1, typename Type2>
+struct StrRepeatTransform : public StringBinaryTransformBase {
+  using ArrayType1 = typename TypeTraits<Type1>::ArrayType;
+  using ArrayType2 = typename TypeTraits<Type2>::ArrayType;
+
+  int64_t MaxCodeunits(int64_t inputs, int64_t input_ncodeunits,
+                       const std::shared_ptr<Scalar>& input2) override {
+    auto nrepeats = static_cast<int64_t>(UnboxScalar<Type2>::Unbox(*input2));
+    return std::max(input_ncodeunits * nrepeats, int64_t(0));
+  }
+
+  int64_t MaxCodeunits(int64_t inputs, int64_t input_ncodeunits,
+                       const std::shared_ptr<ArrayData>& data2) override {
+    ArrayType2 array2(data2);
+    // Ideally, we would like to calculate the exact output size by iterating 
over
+    // all strings offsets and summing each length multiplied by the 
corresponding repeat
+    // value, but this requires traversing the data twice (now and during 
transform).
+    // The upper limit is to assume that all strings are repeated the max 
number of
+    // times knowing that a resize operation is performed at end of execution.
+    auto max_nrepeats =
+        static_cast<int64_t>(**std::max_element(array2.begin(), array2.end()));
+    return std::max(input_ncodeunits * max_nrepeats, int64_t(0));
+  }
+
+  int64_t Transform(const uint8_t* input, int64_t input_string_ncodeunits,
+                    const std::shared_ptr<Scalar>& input2, uint8_t* output) {
+    auto nrepeats = static_cast<int64_t>(UnboxScalar<Type2>::Unbox(*input2));
+    uint8_t* output_start = output;
+    if (nrepeats > 0) {
+      // log2(k) approach

Review comment:
       Ok, will fix the complexity to `O(log2(nrepeats))`
   
   In terms of performance, based on isolated benchmarks I performed comparing 
several *copy* approaches, the log2 approach is faster for all cases where 
`nrepeats >= 4`, and for `nrepeats < 4` it was not reasonably slower than 
direct copies. [In my initial PR, I had an `if-else` to handle 
this](https://github.com/apache/arrow/pull/11023/commits/a0e327d2751137a8b7d47ad524c848eef65066ff#diff-eb8300bc4dea7d1c46b2576b7dbd8e42b927ab7d42c031f4aecae892a72ee244R2596-R2612),
 but thought that having the condition check for all values, in addition, to 
having two approaches, was not better.
   
   This circles back to some of my previous comments/ideas, that the Exec 
methods should provide a mechanism for selecting kernel `Transform/Call` 
variants based on these higher-level options. More on this very soon.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to