alamb commented on issue #6906:
URL: https://github.com/apache/datafusion/issues/6906#issuecomment-2356554263

   > > > I am not familiar enough with StringViewArray, is it ok to do that? 
And will it lead to a extremely bad performance?
   > > 
   > > 
   > > I think using a single `Buffer` for each string will be bad for 
performance (likely worse than storing as `String` and copying them at the end. 
`StringViewArray` is really optimized for a small number of buffers (even 
though in theory it could have 2B of them as it is indexed on `i32`)
   > 
   > Ok, for `StringView min/max`, seems we can just start with using 
`Vec<u128>(views)` to store the inlined state(<= 12), use `Vec<String>` to 
store the unlined.
   > 
   > And when converting it to `StringViewArray`, we just copy the 
`Vec<String>` to create the buffer (`GroupsAccumulatorAdapter` copy the states 
too).
   > 
   > For the short strings(<=12), it can avoid allocating `String`, and for the 
long ones, it just do the same thing as `GroupsAccumulatorAdapter`. Seems it 
can have a better performance(due optimization for shorts)?
   
   Seems like a reasonable place to start in my opinion. If we want to get more 
sophisticated at a later time we can try something more exotic. I suspect there 
will be times when individual allocations will be faster and times when a 
buffer will be faster and there will be memory consumption tradeoffs as well.
   
   TLDR we should implement something and as long as it is better than what is 
currently going on that will be good. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to