Comment #1 on issue 10584 by [email protected]: SB Bloom filter false positive rate too high http://code.google.com/p/chromium/issues/detail?id=10584
I ran a test with an existing set of SafeBrowsing data while varying the size of our bloom filter, and examined the results of looking up 1 million unique popular URLs: Multiplier Hits Misses Size (bytes) 13 871 32147 459788 14 871 21434 495156 15 871 16401 530524 16 871 13008 565893 17 871 11669 601261 18 871 9495 636629 19 871 8077 671997 - Multiplier is a constant by which we size the filter (currently set to 13 in Chrome) - Hits is the number of URLs that had a prefix in the bloom filter AND was also in the database (i.e. valid candidates for a gethash request) - Misses is the number of URLs that had a prefix in the bloom filter but were NOT in the database (i.e. would be a false positive, unnecessary gethash request). - Size is the number of bytes that the filter consumes in memory during normal operation. From this, we can see that increasing the multiplier and using slightly more memory for steady state operation of the filter will decrease the false positive rate. I can generate some further data to see just how much more we can decrease the rate, and what the approximate increase in memory will be. We can push out a test fix with this change immediately, and see what happens in the wild. -- You received this message because you are listed in the owner or CC fields of this issue, or because you starred this issue. You may adjust your issue notification preferences at: http://code.google.com/hosting/settings --~--~---------~--~----~------------~-------~--~----~ Automated mail from issue updates at http://crbug.com/ Subscription options: http://groups.google.com/group/chromium-bugs -~----------~----~----~----~------~----~------~--~---
