[
https://issues.apache.org/jira/browse/ARROW-16562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-16562:
-----------------------------------
Labels: pull-request-available (was: )
> [C++] Avoid slicing array inputs in ExecBatchIterator that would result in
> one slice
> ------------------------------------------------------------------------------------
>
> Key: ARROW-16562
> URL: https://issues.apache.org/jira/browse/ARROW-16562
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: C++
> Reporter: Tobias Zagorni
> Assignee: Tobias Zagorni
> Priority: Minor
> Labels: pull-request-available
> Attachments: avoid-slicing-performance.txt
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> For scalar functions, {{ExecBatchIterator}} is used to iterate over batches
> in smaller units. It is implemented by calling {{{}Array::slice(){}}}. For
> small batches, this is unecessary, since only one slice is created. The slice
> operation still causes some overhead by copying the shrared_ptrs of the
> ArrayData object, inclung the type pointer, which can lead to contention
> (ARROW-16161).
> This Patch checks if the batch size is smaller than the slice size first, and
> uses std::move in this case.
> I have attached a comparision of the ExecuteScalarExpressionOverhead
> benchmark here: [^avoid-slicing-performance.txt]
> (created with --benchmark_min_time=20, the standard low runtime tends to be
> noisy with this, but also shows a positive tendency)
--
This message was sent by Atlassian Jira
(v8.20.7#820007)