@Antoine I created this draft PR for the reverse impl. [1] I felt StringTransform was the most suitable out of the "util classes" available for strings (binary data).
I feel like there is a kind of hacky workaround to this. We could initialize encoded_nbytes to input_string_ncodeunits here [2] :-) [1] https://github.com/apache/arrow/pull/10317 [2] https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L172 On Thu, May 13, 2021 at 3:02 PM Antoine Pitrou <[email protected]> wrote: > > Hi Niranda, > > Le 13/05/2021 à 21:00, Niranda Perera a écrit : > > > > I am writing a String Reverse kernel [1]. I am extending the > > arrow::compute::internal::StringTransform class [2] for my impl. > > StringTransform is expecting the Derived class to implement the following > > Tranfsform method. > > > > bool Transform(const uint8_t* input, offset_type input_string_ncodeunits, > > uint8_t* output, offset_type* output_written) > > > > I believe the output_written pointer expects the Transform method to > > indicate StringTransform about "the number of bytes written to the output > > buffer". > > But in the reverse kernel, this value is predefined (*output_written = > > input_string_ncodeunits). Would it be useful to specialize > StringTransform > > to handle such cases where input size == output size? > > You don't have to reuse StringTransform if another approach would be > better (e.g. more performant). The main concerns here would be > concision, maintainability, performance (in whatever order). > > Regards > > Antoine. > -- Niranda Perera https://niranda.dev/ @n1r44 <https://twitter.com/N1R44>
