hfukada commented on PR #15049: URL: https://github.com/apache/druid/pull/15049#issuecomment-1747188608
I did not try it no. I did try to use KLL sketches, which ReqSketch is based off of, in Druid but the insertion/merge time guarantees were not as strong as DDSketch. I am ultimately not interested in rank-error. Because that ReqSketch paper talks about "no generalizations about data distribution" it loses me on interest about its ability to accurately characterize long-tail distributions. Casually mentioning that [p9X values are exactly that ReqSketch is good at], but not demonstrating its ability to handle these hard cases leaves me unsatisfied with the results presented. In addition to this, it's another "random" algorithm. as in there is some element of randomness built into the sketch. I believe there is a strong value in being able to return deterministic values. I had to chuckle at this line talking about DDSketch's shortcomings: > This definition only makes sense for data universes with a notion of magnitude and distance (e.g., numerical data) DDSketch is not the answer for every problem, I agree. Certainly not sorting or taking p99 of strings. Maybe even not generally applicable. However, I deal exactly with percentiles >0.5, care most about p9x values and doing these calculations fast. It is the algorithm I desire most after having played with it in my own clusters, I am happy with its accuracy, storage, and query performance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
