[GitHub] [datasketches-java] leerho commented on issue #414: tuning theta sketch

GitBox Wed, 21 Sep 2022 13:07:33 -0700


leerho commented on issue #414:
URL: 
https://github.com/apache/datasketches-java/issues/414#issuecomment-1254174825


   A statement I made above was misleading.  I said:
   > Thus your relative error is also proportionately larger.
   I should have said "your absolute error is proportionately larger.
   
   The relative error of these sketches is a constant and determined by the 
user specifying the parameter K (or Log K).  
   Using the example I gave above, a unique counting sketch with a Log_2 K of 
16 will have a relative error of about 0.8 % with a 95% confidence.  This 
Relative Error is constant no matter how big the sketch gets.  But the Absolute 
Error is a function of the sketch size.  Populating this sketch with 1 M items 
with a relative error of 0.8% means that your cardinality estimate will be in 
the range of 1M * (1 +/- .008) or +/- 8K.   Populating this sketch with 10M 
items means that your cardinality estimate will be 10M * (1 +/- .008) or +/- 
80K.  Thus, the absolute error is proportional to the size of the sketch, but 
the relative error is constant.  
   
   Because of my mistake, you concluded:
   > high cardinality sets because they have high relative error 
   
   Which is not correct.
   
   Cheers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datasketches.apache.org
For additional commands, e-mail: commits-h...@datasketches.apache.org

[GitHub] [datasketches-java] leerho commented on issue #414: tuning theta sketch

Reply via email to