okumin commented on code in PR #5444:
URL: https://github.com/apache/hive/pull/5444#discussion_r1877236658


##########
ql/src/test/org/apache/hadoop/hive/ql/optimizer/calcite/stats/TestFilterSelectivityEstimator.java:
##########
@@ -159,7 +159,7 @@ public void testIsHistogramAvailableWhenEmptyArray() {
 
   @Test
   public void testLessThanSelectivity() {
-    Assert.assertEquals(0.6153846153846154, lessThanSelectivity(KLL, 3), 
DELTA);

Review Comment:
   Just let me clarify the problem. In the case of `GetCdfUDF`, originally, we 
provided `(ByteWritable, Float...) -> List<Double>`. Datasketches Hive 2.0 
provides `(ByteWritable, Float...) -> List<Double>` and `(ByteWritable, 
Boolean, Float...) -> List<Double>`, and the first one is now an alias of 
`(ByteWritable, true, Float...) -> List<Double>`. While using Datasketches Hive 
1.2, the method with 2-arity was working as `(ByteWritable, false, Float...) -> 
List<Double>`. So, the original query unit test failed as the 2-arity method 
started using the INCLUSIVE mode.
   
   One more, I guess we don't have to keep 100% compatibility of sketch UDFs 
because it returns an approximate result. However, we will encounter 
incompatibilities when the sketches, not the final result, are directly stored 
as binary. Information stored in HMS is the primary example. Do we have another 
case to directly store sketches on the system side? e.g. Materialized View



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to