Rachelint commented on issue #6906: URL: https://github.com/apache/datafusion/issues/6906#issuecomment-2356428850
I think https://github.com/apache/datafusion/pull/6800#issuecomment-1622236683 > > I am not familiar enough with StringViewArray, is it ok to do that? And will it lead to a extremely bad performance? > > I think using a single `Buffer` for each string will be bad for performance (likely worse than storing as `String` and copying them at the end. `StringViewArray` is really optimized for a small number of buffers (even though in theory it could have 2B of them as it is indexed on `i32`) Ok, for `StringView min/max`, seems we can just start with using `Vec<u128>(views)` to store the inlined state(<= 12), use `Vec<String>` to store the unlined. And when conerting it to `StringViewArray`, we just copy the `Vec<String>` to create the buffer (`GroupsAccumulatorAdapter` copy the states too). For the short strings(<=12), it can avoid allocating `String`, and for the long ones, it just do the same thing as `GroupsAccumulatorAdapter`. Seems it can have a better performance(due optimization for shorts)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org