edponce commented on a change in pull request #11023:
URL: https://github.com/apache/arrow/pull/11023#discussion_r706354519
##########
File path: cpp/src/arrow/compute/kernels/scalar_string.cc
##########
@@ -2357,6 +2584,79 @@ void AddSplit(FunctionRegistry* registry) {
#endif
}
+template <typename Type1, typename Type2>
+struct StrRepeatTransform : public StringBinaryTransformBase {
+ using ArrayType1 = typename TypeTraits<Type1>::ArrayType;
+ using ArrayType2 = typename TypeTraits<Type2>::ArrayType;
+
+ int64_t MaxCodeunits(int64_t inputs, int64_t input_ncodeunits,
+ const std::shared_ptr<Scalar>& input2) override {
+ auto nrepeats = static_cast<int64_t>(UnboxScalar<Type2>::Unbox(*input2));
+ return std::max(input_ncodeunits * nrepeats, int64_t(0));
+ }
+
+ int64_t MaxCodeunits(int64_t inputs, int64_t input_ncodeunits,
+ const std::shared_ptr<ArrayData>& data2) override {
+ ArrayType2 array2(data2);
+ // Ideally, we would like to calculate the exact output size by iterating
over
+ // all strings offsets and summing each length multiplied by the
corresponding repeat
+ // value, but this requires traversing the data twice (now and during
transform).
+ // The upper limit is to assume that all strings are repeated the max
number of
+ // times knowing that a resize operation is performed at end of execution.
Review comment:
Ok, will compute the exact output size. On the bright side, this
bypasses the resizing step at end of Exec.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]