Rich-T-kid commented on PR #21589: URL: https://github.com/apache/datafusion/pull/21589#issuecomment-4246056045
**ScalarValue::try_from_array (38%)** — allocating a new ScalarValue heap object for every single row **hash_one (18%)** — hashing the full ScalarValue including its string contents on every row **PartialEq::eq (5.8%)** — comparing ScalarValues in the HashMap on every row lots of heap allocations are happening on the hot path (**_xzm_free**) - a solution to this may be to pre-allocate space for the unique values array and hashmap. the core problem with using ScalarValue in the hot path, it's a heap allocated enum that gets created and destroyed for every single row. Should work directly with the array buffers -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
