jayzhan211 commented on PR #15867: URL: https://github.com/apache/datafusion/pull/15867#issuecomment-2833182081
```rust fn state(&mut self) -> Result<Vec<ScalarValue>> { let scalars = self.values.iter().cloned().collect::<Vec<_>>(); let arr = ScalarValue::new_list_nullable(scalars.as_slice(), &self.state_data_type); Ok(vec![ScalarValue::List(arr)]) } ``` We clone the hashset from partial aggregation and convert to List for final aggregation. In high cardinality case, where most of the values are different we do aggregation twice + additional clone. I can think of 2 possible solution. 1. Use single aggregation but somehow aggregation parallelly we can try convert it single aggregation and see whether it is fast enough than the current version 2. Find out a way to avoid cloning hashset to list array and initilize the accumulator with the hashset in final aggregation state. This is probably not trivial to ensure we have zero copy all the way down to final aggregation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org