jorisvandenbossche commented on issue #34173: URL: https://github.com/apache/arrow/issues/34173#issuecomment-1445804812
> So I think the correct behavior of skip_nulls=False should be (apologies for the bad pseudo-code): The problem with this logic is that the `mode` kernel also returns the count (since we support returning more than just the "top" mode in https://github.com/apache/arrow/pull/8637). And so what value would you use for the count if you take into account the null count like that in case there is an actual most frequent element? > If skip_nulls=True then the behavior should be the same as `mode(x.filter(pc.is_valid(x))` and should only return null if every element is null. Agreed that this is the typical logic, but so currently for an empty array (or all-null array after skipping the nulls), we return an empty result, not null: ``` >>> pc.mode(pa.array([], pa.int64()), 1, skip_nulls=True) <pyarrow.lib.StructArray object at 0x7fbdaf658580> -- is_valid: all not null -- child 0 type: int64 [] -- child 1 type: int64 [] ``` Should that be changed to a null struct / struct of nulls? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
