leerho commented on issue #703: URL: https://github.com/apache/datasketches-java/issues/703#issuecomment-3684750328
This is why we have the KLL, REQ, and T-digest sketches in the library. One size doesn't fit all. The "unbounded (log N) size" sounds a bit harsh, although theoretically correct. I have characterized the growth path of our KLL sketches with streams up to over 10^12 items: The following graph illustrates the typical size of KLL Floats and KLL Doubles Sketches vs number of items. Once the sketch, of a given K, reaches its "full compaction shape" its size grows very, very slowly. As you can see here, a KllDoubles sketch of 10^10 items would be about 6KB in size. The KllFloats sketch is about half that. This means for practical stream sizes, the KLL sketches stay pretty small. <img width="960" height="540" alt="Image" src="https://github.com/user-attachments/assets/935769ca-a99e-4ce8-83eb-502254446665" /> The growth path for KLL can be estimated from equations and plots that predict the size of the sketch fairly accurately. The following plot shows the growth path normalized per byte of item size. This can be used even for items of variable size as long as you can estimate the average item size. <img width="2805" height="1847" alt="Image" src="https://github.com/user-attachments/assets/b24fe673-1895-4b4f-8150-ccb9dd873703" /> A KLL sketch of a larger"K", would shift this graph to the right. A smaller "K" would shift it to the left. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
