Oops, I meant to respond to Jaap van der Plas with this message. Sorry Paul.

On Sep 26, 2008, at 3:37 PM, Damien Katz wrote:

Your requirements as stated would be well met by a something like Lucene.

However, another possible way to go about this is to permute the key sets into key arrays and emit each. The number of keys would normally be (N!)/2, where N is the number of fields you are indexing. However, we can use views collation to do range lookups, allows us to ignore the different array key suffixes. That would reduce the number of key arrays emitted per document to 2^N. If each document has 10 fields, then the number of permutations would be 2^10 or 1024 keys emitted per doc.

To build that index for 50000 documents would take an on-disk view index of 50,000,000 rows. Building it will take a very long time and it will take a lot of disk space. But once built, it should then possible to do the categorized, drill down searches, that can show you relevant sub-categories and their counts to further narrow down search, and do so pretty efficiently. This is very much the kind of stuff like Endeca does for online retailers.

I don't know if CouchDB views are up to it yet, but it might be worth experimenting.

-Damien


On Sep 26, 2008, at 2:11 PM, Paul Davis wrote:

Reply via email to