jonmmease commented on issue #4828:
URL: 
https://github.com/apache/arrow-datafusion/issues/4828#issuecomment-1372904879

   Some notes:
   
   The error is happening on line 179 here:
   
   
https://github.com/apache/arrow-datafusion/blob/169b5228800f991be913b10ffdfd161e2cbc7baf/datafusion/core/src/physical_plan/repartition.rs#L170-L179
   
   I think the cause is that `arrow::compute::take` is introducing nulls 
instead of empty lists (due to https://github.com/apache/arrow-rs/issues/3471) 
and so a non-nullable column of lists has null values after `take`.
   
   This is an error because the accumulator state field is declared to be 
non-nullable here:
   
   
https://github.com/apache/arrow-datafusion/blob/169b5228800f991be913b10ffdfd161e2cbc7baf/datafusion/physical-expr/src/aggregate/count_distinct.rs#L83-L91
   
   ---
   
   So I think there are two paths:
    1. Change the behavior of the arrow `take` kernel to return empty lists 
instead of nulls.
    2. Treat nulls as empty lists in the count_distinct accumulator.
   
   What do folks think?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to