Rich-T-kid commented on PR #21765:
URL: https://github.com/apache/datafusion/pull/21765#issuecomment-4401926750

   Bench marks
   <img width="941" height="1230" alt="Image 5-7-26 at 7 20 PM" 
src="https://github.com/user-attachments/assets/50125d91-f674-41e0-a030-8979170ad79c";
 />
   <img width="1131" height="1308" alt="Image 5-7-26 at 7 20 PM (1)" 
src="https://github.com/user-attachments/assets/1596ff90-ac19-454a-91ba-b00f4642173f";
 />
   <img width="955" height="565" alt="Image 5-7-26 at 7 20 PM (2)" 
src="https://github.com/user-attachments/assets/6eff0054-68c2-46bb-b24d-e692521caf59";
 />
   the benchmarks in `physical-plan/benches/dictionary_group_values.rs` as well 
as the `datafusion/benchmarks/dict.rs` show a meaningful improvement. But I 
think there are still some improvements that can be made to make it even more 
efficent, One idea I have is to store intermediate bytes in one buffer as 
opposed to a vector of bytes, this removes the double memory allocation that is 
currently happening in intern. another improvement is to add caching to the 
value hashes that are computed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to