devanbenz commented on issue #6906: URL: https://github.com/apache/datafusion/issues/6906#issuecomment-2369040854
> > I have had more time to take a look at this and sort of just wrap my head around how `GroupsAccumulatorAdapter` works a bit. I'm seeing that the performance impact is happening here > > https://github.com/apache/datafusion/blob/a35d0075744a058f81bd9ebed747e2e597434019/datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs#L475 > > > > in `slice_and_maybe_filter` I'm under the assumption that this is happening due to the difference between the `BinaryView` and the `Binary` Scalar values. Taking a look at [influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb](https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/) I understand that `BinaryView` is effectively a non-contiguous structure where as `Binary` is contiguous so it is easily sliced. So the idea here is to effectively change the underlying state in which the accumulator structure receives that data thus making it either easier to call slice or remove the calling of slice entirely? > > I guess the current goal is to remove the calling of slice, and get an at least not worse performance than `StringArray + GroupsAccumulatorAdapter` as mentioned above. Awesome thank you @Rachelint. I've begun working on a small POC locally with this information. It took me a bit of reading up on context to get going for sure. After reading through https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.Accumulator.html and https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.GroupsAccumulator.html, tracing through the code locally, and reading through the discussion a few times I now have a better understanding of whats going on and where to start. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org