jonmmease commented on issue #4828: URL: https://github.com/apache/arrow-datafusion/issues/4828#issuecomment-1372904879
Some notes: The error is happening on line 179 here: https://github.com/apache/arrow-datafusion/blob/169b5228800f991be913b10ffdfd161e2cbc7baf/datafusion/core/src/physical_plan/repartition.rs#L170-L179 I think the cause is that `arrow::compute::take` is introducing nulls instead of empty lists (due to https://github.com/apache/arrow-rs/issues/3471) and so a non-nullable column of lists has null values after `take`. This is an error because the accumulator state field is declared to be non-nullable here: https://github.com/apache/arrow-datafusion/blob/169b5228800f991be913b10ffdfd161e2cbc7baf/datafusion/physical-expr/src/aggregate/count_distinct.rs#L83-L91 --- So I think there are two paths: 1. Change the behavior of the arrow `take` kernel to return empty lists instead of nulls. 2. Treat nulls as empty lists in the count_distinct accumulator. What do folks think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
