Ian Cook created ARROW-12042:
--------------------------------
Summary: [C++] Change or rationalize output of array_sort_indices
on ChunkedArray
Key: ARROW-12042
URL: https://issues.apache.org/jira/browse/ARROW-12042
Project: Apache Arrow
Issue Type: Task
Components: C++
Affects Versions: 3.0.0
Reporter: Ian Cook
Currently when the {{array_sort_indices}} compute function is called on a
ChunkedArray of two or more Arrays, it returns a ChunkedArray of Arrays of
_local_ sort indices for each Array. Demonstrating this with the R bindings
(but note that these R examples will not run until ARROW-11703 is merged):
{code:java}
> x <- ChunkedArray$create(c(2L, 1L), c(4L, 3L))
> arrow:::call_function("array_sort_indices", x, options = list(order = TRUE))
ChunkedArray
[
[
0,
1
],
[
0,
1
]
]
{code}
Compare to the {{sort_indices}} compute function which returns an Array of
_global_ sort indices in this case:
{code:java}
> arrow:::call_function("sort_indices", x, options = list(names = "", orders =
> 1L))
Array
<uint64>
[
2,
3,
0,
1
]{code}
Is this behavior deliberate? If so, we should document it clearly. If not, we
should change it.
Note that the docs currently states that {{array_sort_indices}} only works on
Arrays [https://arrow.apache.org/docs/cpp/compute.html#sorts-and-partitions]
(see note (4)) but evidently that is not exactly correct.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)