westonpace commented on issue #34173:
URL: https://github.com/apache/arrow/issues/34173#issuecomment-1447252341
Ah, I see the problem now. If we are going to be pedantic about it then "an
aggregate function returns exactly one row" and the return type *should* be
`ListType(list<item: struct<mode: int64, count: int64>>)`. In the presence of
nulls the answer is then an array of type `ListType(list<item: struct<mode:
int64, count: int64>>)` with exactly one null element.
Now, it seems that `mode` is properly registered as a vector function and
not an aggregate function. So we don't have to be pedantic (e.g. you can't use
mode in Acero anyways so why bother trying to follow the rules). Given this, I
would say that we should return whatever is most convenient. I think that
would probably be to treat `null` as its own element type and return:
```
pc.mode([1, 2, 2, None, None], 2, skip_nulls=False)
<pyarrow.lib.StructArray object at 0x7f5aa7b5cd00>
-- is_valid: all not null
-- child 0 type: int64
[
2,
null
]
-- child 1 type: int64
[
2,
2
]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]