Ian Cook created ARROW-12042:
--------------------------------

             Summary: [C++] Change or rationalize output of array_sort_indices 
on ChunkedArray
                 Key: ARROW-12042
                 URL: https://issues.apache.org/jira/browse/ARROW-12042
             Project: Apache Arrow
          Issue Type: Task
          Components: C++
    Affects Versions: 3.0.0
            Reporter: Ian Cook


Currently when the {{array_sort_indices}} compute function is called on a 
ChunkedArray of two or more Arrays, it returns a ChunkedArray of Arrays of 
_local_ sort indices for each Array. Demonstrating this with the R bindings 
(but note that these R examples will not run until ARROW-11703 is merged):
{code:java}
> x <- ChunkedArray$create(c(2L, 1L), c(4L, 3L))
> arrow:::call_function("array_sort_indices", x, options = list(order = TRUE))
ChunkedArray
[
  [
    0,
    1
  ],
  [
    0,
    1
  ]
]
{code}
Compare to the {{sort_indices}} compute function which returns an Array of 
_global_ sort indices in this case:
{code:java}

> arrow:::call_function("sort_indices", x, options = list(names = "", orders = 
> 1L))
Array
<uint64>
[
  2,
  3,
  0,
  1
]{code}
Is this behavior deliberate? If so, we should document it clearly. If not, we 
should change it.

Note that the docs currently states that {{array_sort_indices}} only works on 
Arrays [https://arrow.apache.org/docs/cpp/compute.html#sorts-and-partitions] 
(see note (4)) but evidently that is not exactly correct.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to