leerho commented on issue #414: URL: https://github.com/apache/datasketches-java/issues/414#issuecomment-1254174825
A statement I made above was misleading. I said: > Thus your relative error is also proportionately larger. I should have said "your absolute error is proportionately larger. The relative error of these sketches is a constant and determined by the user specifying the parameter K (or Log K). Using the example I gave above, a unique counting sketch with a Log_2 K of 16 will have a relative error of about 0.8 % with a 95% confidence. This Relative Error is constant no matter how big the sketch gets. But the Absolute Error is a function of the sketch size. Populating this sketch with 1 M items with a relative error of 0.8% means that your cardinality estimate will be in the range of 1M * (1 +/- .008) or +/- 8K. Populating this sketch with 10M items means that your cardinality estimate will be 10M * (1 +/- .008) or +/- 80K. Thus, the absolute error is proportional to the size of the sketch, but the relative error is constant. Because of my mistake, you concluded: > high cardinality sets because they have high relative error Which is not correct. Cheers -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@datasketches.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datasketches.apache.org For additional commands, e-mail: commits-h...@datasketches.apache.org