AlexanderSaydakov opened a new pull request, #15257:
URL: https://github.com/apache/druid/pull/15257

   This is to use the latest datasketches-java version 4.2.0.
   This was supposed to be a minor version change, but inadvertently some API 
changes were introduced. Therefore I had to implement a few new required 
methods in the custom ArrayOfStringTuplesSerDe. They will need a careful review 
since I am not entirely sure I understood the serial format correctly.
   Also one test is currently failing. I don’t understand the purpose of this 
test. It is called preservesMinAndMaxWhenAssumeGroupedFalse. I have no idea 
what does this mean. However it asks a quantile sketch to partition 66 items 
into 66 partitions and expects exactly one item in each. If we allow even 
slightest error (and sketches are approximate) we can get some partitions with 
2 items and some empty ones. So with deduplication it leads to fewer partitions.
   This change in behavior from 4.1.0 to 4.2.0 is unfortunate, but not 
incorrect. This is a degenerate use case. I would think that a better test 
could generate, say, 1000 items, ask for 10 partitions and assert that 
partitions have 100+-2 items or something like that. Perhaps this behavior with 
very small partitions can be improved in the next version, but for now I would 
suggest using 4.2.0 and changing this test somehow.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to