zhangbutao commented on code in PR #5444:
URL: https://github.com/apache/hive/pull/5444#discussion_r1816406399
##########
ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/stats/TestFilterSelectivityEstimator.java:
##########
@@ -159,7 +159,7 @@ public void testIsHistogramAvailableWhenEmptyArray() {
@Test
public void testLessThanSelectivity() {
- Assert.assertEquals(0.6153846153846154, lessThanSelectivity(KLL, 3),
DELTA);
Review Comment:
I have found the root cause: some skecth functions have multiple `evaluate`
method, but in Hive we always get the first `evaluate` method, see:
https://github.com/apache/hive/blob/548990dfad78b3d89a334c875d8a6708ef475e88/ql/src/java/org/apache/hadoop/hive/ql/exec/DataSketchesFunctions.java#L288-L297
Here i gave a example: In this commit
https://github.com/apache/datasketches-hive/commit/b6c4d01ff9539d2aff520569277c14d479695bb0#diff-722be040808b3a7ef18a0e86ced7e686fcf553002f82d8332d02d29f28b265a9
, it added another `evaluate `method with the new added param
`QuantileSearchCriteria.INCLUSIVE`, and then, some Qtest which uses `cdf` will
change its result.
And some other skecth functions may have more `evaluate ` method, we also
choose the first one.
IMO, we can not find a good way to choose the best `evaluate` method, what i
can do is regenerate the skecth functions Qtests outout. :(
Do you have any other good ideas? @asolimando
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]