asolimando commented on code in PR #5444:
URL: https://github.com/apache/hive/pull/5444#discussion_r1879080415


##########
ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/stats/TestFilterSelectivityEstimator.java:
##########
@@ -159,7 +159,7 @@ public void testIsHistogramAvailableWhenEmptyArray() {
 
   @Test
   public void testLessThanSelectivity() {
-    Assert.assertEquals(0.6153846153846154, lessThanSelectivity(KLL, 3), 
DELTA);

Review Comment:
   > I have found the root cause: some skecth functions have multiple 
`evaluate` method, but in Hive we always get the first `evaluate` method, see:
   > 
   > 
https://github.com/apache/hive/blob/548990dfad78b3d89a334c875d8a6708ef475e88/ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java#L288-L297
   > 
   > Here i gave a example: In this commit 
[apache/datasketches-hive@b6c4d01#diff-722be040808b3a7ef18a0e86ced7e686fcf553002f82d8332d02d29f28b265a9](https://github.com/apache/datasketches-hive/commit/b6c4d01ff9539d2aff520569277c14d479695bb0#diff-722be040808b3a7ef18a0e86ced7e686fcf553002f82d8332d02d29f28b265a9)
 , it added another `evaluate `method with the new added param 
`QuantileSearchCriteria.INCLUSIVE`, and then, some Qtest which uses `cdf` will 
change its result.
   > 
   > And some other skecth functions may have more `evaluate ` method, we also 
choose the first one.
   > 
   > IMO, we can not find a good way to choose the best `evaluate` method, what 
i can do is regenerate the skecth functions Qtests outout. :(
   > 
   > Do you have any other good ideas? @asolimando
   
   Actually the method you cite only infers the return type, it's not exactly 
what we want, although the root cause it's something similar I guess, someplace 
else.
   
   I have traced some functions finding by reflection the correct `evaluate` 
method for UDF and UADF, respectively 
https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/UDFMethodResolver.java#L49
   and 
https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/DefaultUDAFEvaluatorResolver.java#L57
 
   
   The problem is not in those methods, as they are receiving the actual 
function inputs and just matching on them, what we would need to do, is to 
intercept where this UDF/UDAF are written in the AST/plan and replace 
`$udf($binary, $float)` with `$udf($binary, false, $float)` so that the right 
evaluate function is then invoked.
   
   I tried to track down places where the sketches are introduced (e.g., rules 
like 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRewriteToDataSketchesRules.java)
 but without much luck so far.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to