[GitHub] [arrow-datafusion] Dandandan commented on pull request #5490: Memory limited hash join

via GitHub Tue, 07 Mar 2023 14:31:39 -0800


Dandandan commented on PR #5490:
URL: 
https://github.com/apache/arrow-datafusion/pull/5490#issuecomment-1458962589


   Nice PR!
   
   I think it would be great if we could run some benchmarks to show that we're 
not regressing too much (e.g. running tpch benchmark queries with joins). Some 
reasons I defaulted to initializing the hashmap using the size of the left side 
is as following:
   * The build side (for the partition) already has to be loaded into memory, 
and usually will at least as much and often more memory than the hash table
   * For many cases (e.g. unique identifiers) we need this capacity and the 
estimate is optimal
   * Rebuilding the hash table can be slow (although some improvements were 
made in this area) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on pull request #5490: Memory limited hash join

Reply via email to