On Thu, 22 Mar 2007, Omer Zak wrote:

On Thu, 2007-03-22 at 00:20 +0200, Peter wrote:
 You also
probably want to read up on more advanced indexing methods (like Bloom
and Blooming filters and such) than what's available with ordinary off
the shelf databases.

I searched for information about Blooming filters.

Google gave me surprisingly irrelevant results when I looked up 'bloom'
and 'blooming filter' (apparently, the term is used also in photography
and botanics).

The 'Blooming' name was an unfortunate choice. The people who came up with that are Israelis I think. Try to find their other academic work and trace them, and see where they 'play'. Maybe you can talk to them.

However, the following does explain those filters:
http://en.wikipedia.org/wiki/Bloom_filter

Peter, can you suggest a summary article for the general subject of
advanced indexing methods?

I am not an expert on this, but any algorithm that runs in O(1) or close to that for the data size you use is a candidate. The data size should be obviously less than 2^32 for x86 at least in any indexable dimension if you want a reasonably smooth ride. Larger things (like 30 million record databases to be used in real-time) move up to better things like 64-bit x86 and up. F.ex. 30 million records will require less than 7 bits per entry just to keep a complete linear index in 3GB of RAM (the maximum usable you can put in a x86 32 bit machine). 7 bits is not enough to even make 30 million distinct pointers to 128 records in 3GB, leaving no room for an OS (let alone SQL).

Peter

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to