On Thu, 22 Mar 2007, Omer Zak wrote:
On Thu, 2007-03-22 at 00:20 +0200, Peter wrote:
You also
probably want to read up on more advanced indexing methods (like Bloom
and Blooming filters and such) than what's available with ordinary off
the shelf databases.
I searched for information about Blooming filters.
Google gave me surprisingly irrelevant results when I looked up 'bloom'
and 'blooming filter' (apparently, the term is used also in photography
and botanics).
The 'Blooming' name was an unfortunate choice. The people who came up
with that are Israelis I think. Try to find their other academic work
and trace them, and see where they 'play'. Maybe you can talk to them.
However, the following does explain those filters:
http://en.wikipedia.org/wiki/Bloom_filter
Peter, can you suggest a summary article for the general subject of
advanced indexing methods?
I am not an expert on this, but any algorithm that runs in O(1) or close
to that for the data size you use is a candidate. The data size should
be obviously less than 2^32 for x86 at least in any indexable dimension
if you want a reasonably smooth ride. Larger things (like 30 million
record databases to be used in real-time) move up to better things like
64-bit x86 and up. F.ex. 30 million records will require less than 7
bits per entry just to keep a complete linear index in 3GB of RAM (the
maximum usable you can put in a x86 32 bit machine). 7 bits is not
enough to even make 30 million distinct pointers to 128 records in 3GB,
leaving no room for an OS (let alone SQL).
Peter
=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]