hfukada commented on PR #15049:
URL: https://github.com/apache/druid/pull/15049#issuecomment-1747188608

   I did not try it no. 
   
   I did try to use KLL sketches, which ReqSketch is based off of, in Druid but 
the insertion/merge time guarantees were not as strong as DDSketch. I am 
ultimately not interested in rank-error.
   
   Because that ReqSketch paper talks about "no generalizations about data 
distribution" it loses me on interest about its ability to accurately 
characterize long-tail distributions. Casually mentioning that [p9X values are 
exactly that ReqSketch is good at], but not demonstrating its ability to handle 
these hard cases leaves me unsatisfied with the results presented.
   
   In addition to this, it's another "random" algorithm. as in there is some 
element of randomness built into the sketch. I believe there is a strong value 
in being able to return deterministic values.
   
   I had to chuckle at this line talking about DDSketch's shortcomings:
   > This definition only makes sense for data universes with a notion of 
magnitude and distance
   (e.g., numerical data)
   
   DDSketch is not the answer for every problem, I agree. Certainly not sorting 
or taking p99 of strings. Maybe even not generally applicable. However, I deal 
exactly with percentiles >0.5, care most about p9x values and doing these 
calculations fast. It is the algorithm I desire most after having played with 
it in my own clusters, I am happy with its accuracy, storage, and query 
performance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to