AlexanderSaydakov opened a new pull request, #15257: URL: https://github.com/apache/druid/pull/15257
This is to use the latest datasketches-java version 4.2.0. This was supposed to be a minor version change, but inadvertently some API changes were introduced. Therefore I had to implement a few new required methods in the custom ArrayOfStringTuplesSerDe. They will need a careful review since I am not entirely sure I understood the serial format correctly. Also one test is currently failing. I don’t understand the purpose of this test. It is called preservesMinAndMaxWhenAssumeGroupedFalse. I have no idea what does this mean. However it asks a quantile sketch to partition 66 items into 66 partitions and expects exactly one item in each. If we allow even slightest error (and sketches are approximate) we can get some partitions with 2 items and some empty ones. So with deduplication it leads to fewer partitions. This change in behavior from 4.1.0 to 4.2.0 is unfortunate, but not incorrect. This is a degenerate use case. I would think that a better test could generate, say, 1000 items, ask for 10 partitions and assert that partitions have 100+-2 items or something like that. Perhaps this behavior with very small partitions can be improved in the next version, but for now I would suggest using 4.2.0 and changing this test somehow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
