zinking commented on PR #3038: URL: https://github.com/apache/iceberg/pull/3038#issuecomment-1132688670
> Thanks @wypoon! I'm okay with this, but I'd prefer to return a better estimate based on number of rows and not compressed size at all. Interested to hear what @aokolnychyi thinks. well, rows * schema size over estimates table sizes under some circumstances, for example TPCDS-sf1000 Q7 demographies dim table, causing broadcast join degrade to sort merge join. I guess it still works in 3.2 because of AQE. totalSize * readcols size / total cols size is what hive adopted. but certainly this is underestimating in some circumstances. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
