Oops, I meant to respond to Jaap van der Plas with this message. Sorry
Paul.
On Sep 26, 2008, at 3:37 PM, Damien Katz wrote:
Your requirements as stated would be well met by a something like
Lucene.
However, another possible way to go about this is to permute the key
sets into key arrays and emit each. The number of keys would
normally be (N!)/2, where N is the number of fields you are
indexing. However, we can use views collation to do range lookups,
allows us to ignore the different array key suffixes. That would
reduce the number of key arrays emitted per document to 2^N. If each
document has 10 fields, then the number of permutations would be
2^10 or 1024 keys emitted per doc.
To build that index for 50000 documents would take an on-disk view
index of 50,000,000 rows. Building it will take a very long time and
it will take a lot of disk space. But once built, it should then
possible to do the categorized, drill down searches, that can show
you relevant sub-categories and their counts to further narrow down
search, and do so pretty efficiently. This is very much the kind of
stuff like Endeca does for online retailers.
I don't know if CouchDB views are up to it yet, but it might be
worth experimenting.
-Damien
On Sep 26, 2008, at 2:11 PM, Paul Davis wrote: