Hi Team, I thought this might be interesting to some of us :)

If anyone out there also struggles to understand / intuit how a bloom
filter's multiple hash functions (which Lucene's bloom filter implements)
improve the false positive error rate, here is an interactive 3D
surface/chart to see how hash function count (k), bits per element (m/n),
and false positive error rate relate:

    https://githubsearch.mikemccandless.com/gemini_bloom10.html

The mouseover also shows the saturation (percent bits set) but it's not
rendered in the graph.

I created this by iterating with Gemini Pro 3 ... see the full
prompting/iterations here: https://gemini.google.com/share/46a219e0abaf

Net/net doing multiple hash functions into the same backing bit set is
surprisingly worthwhile even as the saturation grows, up to 50%.

It's entirely possible Gemini hallucinated something in the graph!!  But it
seems sorta correct to me.  And of course that false positive error rate
formula involves e!

Mike McCandless

http://blog.mikemccandless.com

Reply via email to