asolimando commented on PR #5444: URL: https://github.com/apache/hive/pull/5444#issuecomment-2353593637
> @asolimando Could you also please take a look? The code changes is related to your last commit of histogram-based stats [HIVE-26221](https://issues.apache.org/jira/browse/HIVE-26221) IMO, the kll stats is just a rough estimate, so after the datasketch upgrading, it is reasonable for the changes related Uts? @zhangbutao, I have skimmed the changes and although I agree that KLL stats are estimates, the specific unit-tests I have added for HIVE-26221 were hand-crafted and double checked manually on a small sample, so I wouldn't expect changes there (especially the estimation of rows surviving a particular filter is sometimes "big"). My suggestion would be to double check if the initialization changes haven't impacted the way the KLL data sketch is computed, and to double check manually with small samples (even outside Hive, with just the library in some Java code). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
