Re: [I] groupBy HLL merge produces inconsistent results (druid)

via GitHub Mon, 26 Feb 2024 10:03:38 -0800


gianm commented on issue #13987:
URL: https://github.com/apache/druid/issues/13987#issuecomment-1964800158


   In general it isn't an expectation that the HLL operations produce exactly 
the same estimate every time.
   
   I did ask the datasketches folks about this once, you can see the response 
if you have an account on ASF Slack: 
https://the-asf.slack.com/archives/CP0930GKG/p1682452090261029. TLDR is that 
repeatable order-insensitive results are not something that the datasketches 
team is going for with most of their sketches. So in Druid land we inherit this.
   
   A couple of relevant comments in the discussion, from a datasketches 
developer:
   
   > Sketches, by their design are approximate and/or probabilistic, and in 
general, the results from a sketch should be viewed as such.  This means that 
the user should not expect either exact results, because sketches are 
approximate, nor expect exact repeatability of results even with identical 
inputs, because sketches should be treated as probabilistic.
   
   & 
   
   > As soon as the user starts demanding exactness in terms of either accuracy 
or repeatability of results, it requires specific details of the sketch 
algorithm, and may not be achievable without compromising other properties of 
the sketch as Jon mentions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] groupBy HLL merge produces inconsistent results (druid)

Reply via email to