zhangbutao commented on code in PR #5444:
URL: https://github.com/apache/hive/pull/5444#discussion_r1816406399


##########
ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/stats/TestFilterSelectivityEstimator.java:
##########
@@ -159,7 +159,7 @@ public void testIsHistogramAvailableWhenEmptyArray() {
 
   @Test
   public void testLessThanSelectivity() {
-    Assert.assertEquals(0.6153846153846154, lessThanSelectivity(KLL, 3), 
DELTA);

Review Comment:
   I have found the root cause: some skecth functions have multiple `evaluate` 
method, but in Hive we always get the first `evaluate` method, see:
   
https://github.com/apache/hive/blob/548990dfad78b3d89a334c875d8a6708ef475e88/ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java#L288-L297
   
   Here i gave a example: In this commit 
https://github.com/apache/datasketches-hive/commit/b6c4d01ff9539d2aff520569277c14d479695bb0#diff-722be040808b3a7ef18a0e86ced7e686fcf553002f82d8332d02d29f28b265a9
 , it added another `evaluate `method with the new added param 
`QuantileSearchCriteria.INCLUSIVE`, and then, some Qtest which uses `cdf` will 
change its result.
   
   And some other skecth functions may have more `evaluate ` method, we also 
choose the first one.
   
   IMO, we can not find a good way to choose the best `evaluate` method, what i 
can do is regenerate the skecth functions Qtests outout. :(
   
   Do you have any other good ideas? @asolimando 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to