Re: Extending arrow::compute::internal::StringTransform class

Niranda Perera Thu, 13 May 2021 12:25:03 -0700

@Antoine I created this draft PR for the reverse impl. [1]

I felt StringTransform was the most suitable out of the "util classes"
available for strings (binary data).


I feel like there is a kind of hacky workaround to this. We could
initialize encoded_nbytes to input_string_ncodeunits here [2] :-)

[1] https://github.com/apache/arrow/pull/10317
[2]
https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L172

On Thu, May 13, 2021 at 3:02 PM Antoine Pitrou <[email protected]> wrote:

>
> Hi Niranda,
>
> Le 13/05/2021 à 21:00, Niranda Perera a écrit :
> >
> > I am writing a String Reverse kernel [1]. I am extending the
> > arrow::compute::internal::StringTransform class [2] for my impl.
> > StringTransform is expecting the Derived class to implement the following
> > Tranfsform method.
> >
> > bool Transform(const uint8_t* input, offset_type input_string_ncodeunits,
> > uint8_t* output, offset_type* output_written)
> >
> > I believe the output_written pointer expects the Transform method to
> > indicate StringTransform about "the number of bytes written to the output
> > buffer".
> > But in the reverse kernel, this value is predefined (*output_written =
> > input_string_ncodeunits). Would it be useful to specialize
> StringTransform
> > to handle such cases where input size == output size?
>
> You don't have to reuse StringTransform if another approach would be
> better (e.g. more performant).  The main concerns here would be
> concision, maintainability, performance (in whatever order).
>
> Regards
>
> Antoine.
>


-- 
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>

Re: Extending arrow::compute::internal::StringTransform class

Reply via email to