asolimando commented on PR #5444:
URL: https://github.com/apache/hive/pull/5444#issuecomment-2353593637

   > @asolimando Could you also please take a look? The code changes is related 
to your last commit of histogram-based stats 
[HIVE-26221](https://issues.apache.org/jira/browse/HIVE-26221) IMO, the kll 
stats is just a rough estimate, so after the datasketch upgrading, it is 
reasonable for the changes related Uts?
   
   @zhangbutao, I have skimmed the changes and although I agree that KLL stats 
are estimates, the specific unit-tests I have added for HIVE-26221 were 
hand-crafted and double checked manually on a small sample, so I wouldn't 
expect changes there (especially the estimation of rows surviving a particular 
filter is sometimes "big").
   
   My suggestion would be to double check if the initialization changes haven't 
impacted the way the KLL data sketch is computed, and to double check manually 
with small samples (even outside Hive, with just the library in some Java code).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to