jorisvandenbossche commented on a change in pull request #8805:
URL: https://github.com/apache/arrow/pull/8805#discussion_r542510825



##########
File path: python/pyarrow/tests/test_compute.py
##########
@@ -1017,3 +1017,51 @@ def test_partition_nth():
                for i in range(pivot))
     assert all(data[indices[i]] >= data[indices[pivot]]
                for i in range(pivot, len(data)))
+
+
+def test_array_sort_indices():
+    arr = pa.array([1, 2, None, 0])
+    result = pc.array_sort_indices(arr)
+    assert result.to_pylist() == [3, 0, 1, 2]
+    result = pc.array_sort_indices(arr, order="ascending")
+    assert result.to_pylist() == [3, 0, 1, 2]
+    result = pc.array_sort_indices(arr, order="descending")
+    assert result.to_pylist() == [1, 0, 3, 2]
+
+    with pytest.raises(ValueError, match="not a valid order"):
+        pc.array_sort_indices(arr, order="nonscending")
+
+
+def test_sort_indices_array():
+    arr = pa.array([1, 2, None, 0])
+    result = pc.sort_indices(arr)
+    assert result.to_pylist() == [3, 0, 1, 2]
+    result = pc.sort_indices(arr, sort_keys=[("dummy", "ascending")])

Review comment:
       Yes, it's certainly not the nicest API .. 
   Note that for this specific case, the additional "dummy" is needed because 
it's using the generic `sort_indices` kernel. With `pc.sort_indices_array` it 
would be simpler as `order="ascending"`.
   
   > It's a bit of a pity to have such a verbose convention (having to type 
"ascending" for every sort key).
   
   We could add a "short-cut" to write the default a bit more succinctly in 
case all columns are sorted ascending. For example, we could allow 
`sort_keys=["col1", "col2"]` and translate it to `sort_keys=[("col1", 
"ascending"), ("col2", "ascending")]` on the python side. 
   However, of course once you want to sort (one of the columns) descending, 
you still need the more verbose option.
   
   Another idea could be to have a `order` keyword on the python side, but that 
only works to set the order of *all* sort keys to the same value (so doing 
something like `sort_keys=["col1", "col2"], order="descending"` would be 
simpler).
   
   Pandas indeed splits the sort key and the sort order in two different 
keywords, but as you mention, that has drawbacks as well, specifically when 
sorting by multiple columns. 
   
   I am personally not a big fan of adding any kind of "string syntax".
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to