Comment #1 on issue 10584 by [email protected]: SB Bloom filter false  
positive rate too high
http://code.google.com/p/chromium/issues/detail?id=10584

I ran a test with an existing set of SafeBrowsing data while varying the  
size of our bloom filter,
and examined the results of looking up 1 million unique popular URLs:

Multiplier  Hits  Misses  Size (bytes)
13          871   32147   459788
14          871   21434   495156
15          871   16401   530524
16          871   13008   565893
17          871   11669   601261
18          871    9495   636629
19          871    8077   671997

- Multiplier is a constant by which we size the filter (currently set to 13  
in Chrome)
- Hits is the number of URLs that had a prefix in the bloom filter AND was  
also in the database (i.e.
valid candidates for a gethash request)
- Misses is the number of URLs that had a prefix in the bloom filter but  
were NOT in the database
(i.e. would be a false positive, unnecessary gethash request).
- Size is the number of bytes that the filter consumes in memory during  
normal operation.

 From this, we can see that increasing the multiplier and using slightly  
more memory for steady state
operation of the filter will decrease the false positive rate. I can  
generate some further data to
see just how much more we can decrease the rate, and what the approximate  
increase in memory will be.

We can push out a test fix with this change immediately, and see what  
happens in the wild.



--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

--~--~---------~--~----~------------~-------~--~----~
Automated mail from issue updates at http://crbug.com/
Subscription options: http://groups.google.com/group/chromium-bugs
-~----------~----~----~----~------~----~------~--~---

Reply via email to