BloomFilter-s with Lucene

Andrzej Bialecki Fri, 30 Jan 2009 09:30:51 -0800

Hi all,

I've been using BloomFilters for various tasks, and I can't shake thefeeling that they could be of some use in Lucene internals, to speed upvarious membership tests, especially if we look for 100% correctnegatives, and we can accept a small rate of false positives.

For example, let's consider the recent discussion about deleted docs. Ifwe maintained a simple BloomFilter for deleted docs (per segment), thenwe could say for sure when a document is _not_ deleted. Since segmentsusually contain small percentage of deleted docs this shouldtremendously speed up the check. If the test returns true, then we needto actually look up the id in the list of deleted docs (because of asmall chance of false positives - but this perhaps can be acceptable insome situations?).

Another half-formed idea is to somehow use them in Filter-s, to save onspace and perhaps on lookup time, at the cost of some false positives.

I suspect there could be other places where we can use this structure.It doesn't have to be anything sophisticated or large - in some contextsit's perhaps enough to have a BloomFilter consisting of a single (long)value, plus a trivial hashing scheme for setting and probing bits inthat long value.


Sorry if this sounds too vague, it's just some food for thought ...

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

BloomFilter-s with Lucene

Reply via email to