zinking commented on PR #3038:
URL: https://github.com/apache/iceberg/pull/3038#issuecomment-1132688670

   > Thanks @wypoon! I'm okay with this, but I'd prefer to return a better 
estimate based on number of rows and not compressed size at all. Interested to 
hear what @aokolnychyi thinks.
   
   well, rows * schema size over estimates table sizes under some 
circumstances, for example TPCDS-sf1000 Q7 demographies dim table, causing 
broadcast join degrade to sort merge join.
   
   I guess it still works in 3.2 because of AQE. 
   
   totalSize * readcols size / total cols size  is what hive adopted. but 
certainly this is underestimating in some circumstances. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to