jorisvandenbossche commented on PR #14781:
URL: https://github.com/apache/arrow/pull/14781#issuecomment-1339114243

   I just realized an issue with the simple workaround for sorting a 
StructArray by selecting one of its fields, and that is that this ignores 
top-level nulls ..
   
   Consider this example:
   
   ```
   In [25]: arr = pa.StructArray.from_arrays([pa.array([5, 3, 4, 2, 1]), 
pa.array([1, 2, 3, 4, 5])], names=['a', 'b'], mask=pa.array([False, True, 
False, False, False]))
   
   In [27]: arr.to_pylist()
   Out[27]: [{'a': 5, 'b': 1}, None, {'a': 4, 'b': 3}, {'a': 2, 'b': 4}, {'a': 
1, 'b': 5}]
   
   In [30]: arr_sorted = arr.take(pc.sort_indices(arr.field('a')))
   
   In [31]: arr_sorted.to_pylist()
   Out[31]: [{'a': 1, 'b': 5}, {'a': 2, 'b': 4}, None, {'a': 4, 'b': 3}, {'a': 
5, 'b': 1}]
   ```
   
   This is due to what the `field()` method returns, of course. But we also 
have the `StructArray.flatten()` method, which gives the correct array to sort 
by (it only does this for all fields, while we only need one; but we could also 
directly use the `StructArray::GetFlattenedField` which is used by `flatten()` 
under the hood):
   
   ```
   In [36]: arr_sorted2 = arr.take(pc.sort_indices(arr.flatten()[0]))
   
   In [37]: arr_sorted2.to_pylist()
   Out[37]: [{'a': 1, 'b': 5}, {'a': 2, 'b': 4}, {'a': 4, 'b': 3}, {'a': 5, 
'b': 1}, None]
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to