pvary commented on pull request #3328: URL: https://github.com/apache/iceberg/pull/3328#issuecomment-949532847
Sorry for the many short comments (not too much time today), but let me summarise: - We should double check everything which I am stating here for Hive 2.3.8/9, and 3.1.2 - We were working on Hive 4.0.0 when considered stats. - We found that the estimation causes issues for tables with plenty of files. If we turn off `hive.stats.estimate` then we do not end up listing the directories recursively, so that could fix the planning performance issue - We found that if the `hive.stats.autogather` is true, then the statistics are collected, but there was problem with the `rowDataSize`. We fixed this in HIVE-24928 using the Iceberg table statistics where we propagate the Iceberg statistics to use as Hive statistics. - For automatic Column statistics we need: HIVE-25276 - We still has to consider other engines writing these tables, and we have to invalidate column statistics, if other engine is update the table. For this we created this change: HIVE-25286 I hope this finally helps 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
