asolimando commented on code in PR #5444:
URL: https://github.com/apache/hive/pull/5444#discussion_r1879080415
##########
ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/stats/TestFilterSelectivityEstimator.java:
##########
@@ -159,7 +159,7 @@ public void testIsHistogramAvailableWhenEmptyArray() {
@Test
public void testLessThanSelectivity() {
- Assert.assertEquals(0.6153846153846154, lessThanSelectivity(KLL, 3),
DELTA);
Review Comment:
> I have found the root cause: some skecth functions have multiple
`evaluate` method, but in Hive we always get the first `evaluate` method, see:
>
>
https://github.com/apache/hive/blob/548990dfad78b3d89a334c875d8a6708ef475e88/ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java#L288-L297
>
> Here i gave a example: In this commit
[apache/datasketches-hive@b6c4d01#diff-722be040808b3a7ef18a0e86ced7e686fcf553002f82d8332d02d29f28b265a9](https://github.com/apache/datasketches-hive/commit/b6c4d01ff9539d2aff520569277c14d479695bb0#diff-722be040808b3a7ef18a0e86ced7e686fcf553002f82d8332d02d29f28b265a9)
, it added another `evaluate `method with the new added param
`QuantileSearchCriteria.INCLUSIVE`, and then, some Qtest which uses `cdf` will
change its result.
>
> And some other skecth functions may have more `evaluate ` method, we also
choose the first one.
>
> IMO, we can not find a good way to choose the best `evaluate` method, what
i can do is regenerate the skecth functions Qtests outout. :(
>
> Do you have any other good ideas? @asolimando
Actually the method you cite only infers the return type, it's not exactly
what we want, although the root cause it's something similar I guess, someplace
else.
I have traced some functions finding by reflection the correct `evaluate`
method for UDF and UADF, respectively
https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/UDFMethodResolver.java#L49
and
https://github.com/apache/hive/blob/master/udf/src/java/org/apache/hadoop/hive/ql/exec/DefaultUDAFEvaluatorResolver.java#L57
The problem is not in those methods, as they are receiving the actual
function inputs and just matching on them, what we would need to do, is to
intercept where this UDF/UDAF are written in the AST/plan and replace
`$udf($binary, $float)` with `$udf($binary, false, $float)` so that the right
evaluate function is then invoked.
I tried to track down places where the sketches are introduced (e.g., rules
like
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRewriteToDataSketchesRules.java)
but without much luck so far.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]