devanbenz commented on issue #6906:
URL: https://github.com/apache/datafusion/issues/6906#issuecomment-2369040854

   > > I have had more time to take a look at this and sort of just wrap my 
head around how `GroupsAccumulatorAdapter` works a bit. I'm seeing that the 
performance impact is happening here
   > > 
https://github.com/apache/datafusion/blob/a35d0075744a058f81bd9ebed747e2e597434019/datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs#L475
   > > 
   > > in `slice_and_maybe_filter` I'm under the assumption that this is 
happening due to the difference between the `BinaryView` and the `Binary` 
Scalar values. Taking a look at 
[influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb](https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/)
 I understand that `BinaryView` is effectively a non-contiguous structure where 
as `Binary` is contiguous so it is easily sliced. So the idea here is to 
effectively change the underlying state in which the accumulator structure 
receives that data thus making it either easier to call slice or remove the 
calling of slice entirely?
   > 
   > I guess the current goal is to remove the calling of slice, and get an at 
least not worse performance than `StringArray + GroupsAccumulatorAdapter` as 
mentioned above.
   
   Awesome thank you @Rachelint. 
   
   I've begun working on a small POC locally with this information. It took me 
a bit of reading up on context to get going for sure. After reading through 
https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.Accumulator.html
 and 
https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.GroupsAccumulator.html,
 tracing through the code locally, and reading through the discussion a few 
times I now have a better understanding of whats going on and where to start.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to