Rich-T-kid commented on PR #21589:
URL: https://github.com/apache/datafusion/pull/21589#issuecomment-4246056045

   **ScalarValue::try_from_array (38%)** — allocating a new ScalarValue heap 
object for every single row
   **hash_one (18%)** — hashing the full ScalarValue including its string 
contents on every row
   **PartialEq::eq (5.8%)** — comparing ScalarValues in the HashMap on every row
   
   lots of heap allocations are happening on the hot path (**_xzm_free**) - a 
solution to this may be to pre-allocate space for the unique values array and 
hashmap. 
   the core problem with using ScalarValue in the hot path, it's a heap 
allocated enum that gets created and destroyed for every single row. Should 
work directly with the array buffers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to