[GitHub] [arrow-datafusion] Dandandan commented on pull request #1831: determine build side in hash join by `total_byte_size` instead of `num_rows`

GitBox Tue, 15 Feb 2022 11:10:03 -0800


Dandandan commented on pull request #1831:
URL: 
https://github.com/apache/arrow-datafusion/pull/1831#issuecomment-1040566558



   Thanks @xudong963 that's a great point.
   I think the reason for picking number of rows earlier is that lot of other 
design docs talks about number of rows rather than the size in bytes. I agree 
it makes more sense to look at the size in bytes.
   
   The number of rows might be more often available as statistic than the total 
size in bytes. So I think we should look at the size in bytes if it is 
available and otherwise estimate the size based on the number of rows and data 
types involved (e.g. int32 -> 4 * number of rows)
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on pull request #1831: determine build side in hash join by `total_byte_size` instead of `num_rows`

Reply via email to