alamb commented on issue #18411: URL: https://github.com/apache/datafusion/issues/18411#issuecomment-3665195440
That being said, one idea I had about optimizing the case of "all short strings" (aka all strings that fit in 12 bytes or less views) I do think we could have the group values implementation special case short strings 1. If all strings in the input array were short (no data buffers) stored them as a HashSet(u128) (aka stored the values directly) If a new batch arrived that had longer strings, then we would have to fallback to the current implementation that stored data buffers. It would certainly help this query and I could definitely see it being helpful for real queries on short string columns 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
