leerho commented on PR #554:
URL: 
https://github.com/apache/datasketches-java/pull/554#issuecomment-2093715422

   Yes.  When I wrote that function, I couldn't figure out a an easy way to 
handle variable sized items and that pretty much ruled out generic items 
sketches.  Since then, I have developed a more sophisticated growth model for 
the sketch, where I can predict any point on the growth curve assuming an item 
size -- what you see in those graphs.  For your case, coupled with a 
distribution of item sizes, and with this new version of 
getMaxSerializedSizeBytes(k, n, itemSize), we can pick a reasonable quantile 
from the size distribution and plug it into this new function. It could be the 
max item size, but if your distribution is more power-law, that would be 
wasteful. So we could pick the median, or what ever.  
   
   Once we have the full item-size distribution, it allows us to ask questions 
like: "if I choose an quantile from the size distribution and my _**n**_ turns 
out in practice to be doubled in size, what would the size of the sketch be?"  
Easy-peezy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to