[
https://issues.apache.org/jira/browse/ARROW-13570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eduardo Ponce updated ARROW-13570:
----------------------------------
Description:
Some ASCII scalar string kernels are able to reuse the original offsets buffer,
so they are not preallocated in the output (use *MemAllocation::NO_PREALLOCATE*
during registration). Currently, only kernels that apply a transformation to
each character independently via
[StringDataTransform|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L590-L631]
support the no preallocation policy. But there are additional string kernels
that do not modify the length (nor offsets) of the input string but apply
scalar transforms that depend on neighboring characters.
This issue should extend/create *StringDataTransform* to take multiple input
transforms in order to support *MemAllocation::NO_PREALLOCATE* policy for
additional scalar ASCII kernels (e.g., _ascii_title_).
was:
Some ASCII scalar string kernels are able to reuse the original offsets buffer,
so they are not preallocated in the output (use *MemAllocation::NO_PREALLOCATE*
during registration). Currently, only kernels that apply a transformation to
each character independently via
[StringDataTransform|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L590-L631]
support the no preallocation policy. But there are additional string kernels
that do not modify the length (nor offsets) of the input string but apply
different transforms throughout the characters.
This issue should extend/create *StringDataTransform* to take multiple input
transforms in order to support *MemAllocation::NO_PREALLOCATE* policy for
additional scalar ASCII kernels (e.g., _ascii_title_).
> [C++][Compute] Additional scalar ASCII kernels can reuse original offsets
> buffer
> --------------------------------------------------------------------------------
>
> Key: ARROW-13570
> URL: https://issues.apache.org/jira/browse/ARROW-13570
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Eduardo Ponce
> Priority: Major
> Fix For: 6.0.0
>
>
> Some ASCII scalar string kernels are able to reuse the original offsets
> buffer, so they are not preallocated in the output (use
> *MemAllocation::NO_PREALLOCATE* during registration). Currently, only kernels
> that apply a transformation to each character independently via
> [StringDataTransform|https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L590-L631]
> support the no preallocation policy. But there are additional string kernels
> that do not modify the length (nor offsets) of the input string but apply
> scalar transforms that depend on neighboring characters.
> This issue should extend/create *StringDataTransform* to take multiple input
> transforms in order to support *MemAllocation::NO_PREALLOCATE* policy for
> additional scalar ASCII kernels (e.g., _ascii_title_).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)