alamb commented on issue #6906: URL: https://github.com/apache/datafusion/issues/6906#issuecomment-2356326589
> The challenge of `String` seems that? > > * If we just simply use a `Vec<String>` like `primitives` to keep the min/max values, it is too expensive to convert them to `StringArray`/`StringViewArray`(many many copy) I think the overhead is actually mostly that there is an additional (small) allocation for each `String`. For queries with a small numer of groups (like 100) an extra 100 allocations isn't all that bad. For queries with millions of groups the overhad is substantial > * But if we use `StringArray` like approach to keep the values, we can't update the min/max values. I suppose we could potentially update the values as long as the new strings were shorter :thinking: > * So Finally we need to use a `StringViewArray` like approach to make it, but still have the new challenge about gc? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org