leerho commented on PR #554: URL: https://github.com/apache/datasketches-java/pull/554#issuecomment-2093715422
Yes. When I wrote that function, I couldn't figure out a an easy way to handle variable sized items and that pretty much ruled out generic items sketches. Since then, I have developed a more sophisticated growth model for the sketch, where I can predict any point on the growth curve assuming an item size -- what you see in those graphs. For your case, coupled with a distribution of item sizes, and with this new version of getMaxSerializedSizeBytes(k, n, itemSize), we can pick a reasonable quantile from the size distribution and plug it into this new function. It could be the max item size, but if your distribution is more power-law, that would be wasteful. So we could pick the median, or what ever. Once we have the full item-size distribution, it allows us to ask questions like: "if I choose an quantile from the size distribution and my _**n**_ turns out in practice to be doubled in size, what would the size of the sketch be?" Easy-peezy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
