alamb commented on issue #6906: URL: https://github.com/apache/datafusion/issues/6906#issuecomment-2356554263
> > > I am not familiar enough with StringViewArray, is it ok to do that? And will it lead to a extremely bad performance? > > > > > > I think using a single `Buffer` for each string will be bad for performance (likely worse than storing as `String` and copying them at the end. `StringViewArray` is really optimized for a small number of buffers (even though in theory it could have 2B of them as it is indexed on `i32`) > > Ok, for `StringView min/max`, seems we can just start with using `Vec<u128>(views)` to store the inlined state(<= 12), use `Vec<String>` to store the unlined. > > And when converting it to `StringViewArray`, we just copy the `Vec<String>` to create the buffer (`GroupsAccumulatorAdapter` copy the states too). > > For the short strings(<=12), it can avoid allocating `String`, and for the long ones, it just do the same thing as `GroupsAccumulatorAdapter`. Seems it can have a better performance(due optimization for shorts)? Seems like a reasonable place to start in my opinion. If we want to get more sophisticated at a later time we can try something more exotic. I suspect there will be times when individual allocations will be faster and times when a buffer will be faster and there will be memory consumption tradeoffs as well. TLDR we should implement something and as long as it is better than what is currently going on that will be good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org