thomasrebele commented on issue #700:
URL: 
https://github.com/apache/datasketches-java/issues/700#issuecomment-3670891384

   Thanks to everyone for all the background information. I'm not yet sure 
whether I'll just limit the resolution of the selectivity estimate of the 
predicate to the resolution of the KLL sketch (e.g., the 1.65% for k=200), or 
whether I use interpolation. The results don't need to be perfect (those of the 
KLL sketch aren't either), just good enough for the purpose of selectivity 
estimation.
   
   Thank you for the suggestion, @leerho, I have already used that to implement 
my own interpolation. I did some experiments with cubic splines interpolation. 
At first I used some artificial histograms and compared the interpolated vs the 
non-interpolated average error on a Gaussian distribution. Interpolation seemed 
to reduce the error noticeably. However, when trying to do the same with the 
KLL sketch, the two errors were almost the same. The cause seems to be how the 
histogram bins are chosen: my first experiment had equi-distant bins, while the 
width of the KLL sketch histogram bins may vary. There might be some ways to 
improve the interpolation, but as of now I didn't follow that path.
   
   I've stumbled upon the t-digest in the tutorial that @leerho mentioned. The 
t-digest already supports interpolation. It seems to provide much better 
accuracy for large N, at least in my experiments. If I find a way to work 
around the accuracy problems for small N, it might be a valuable alternative to 
the KLL sketch. The accuracy problems are nicely shown in 
https://datasketches.apache.org/docs/QuantilesStudies/KllSketchVsTDigest.html#t-digest-compression100.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to