Chris.. thanks for you quick response. :doing a few thousand BitSet intersections doesn't take as much time as you think Even if the BitSet is around 4-5 million? and I would have to quickly go through about a thousand of these?
I guess I would have to decide what sub-cats to cache the bitsets for. But a bitset of 5 million documents would be around 625k each (right?).. I would need lots of RAM to do this for several thousand sub-cats. I read your link and it would be the same for the mfgrs... How would you know what manufacurers to display? I'm not sure how many mfgrs are in each category for this site, but would you cash all manufacturers and intersect all with the initial query in one page load? Seems like that would be alot. So your saying that in a single page load I might be able to do one intitial query, and intersect thousands of bitsets in under a second with and an index of around 5 million documents, assuming that the server/pc is decent speed and enough memory? I still would have to cache thousands of 625k bits.. Could I do this with files instead of RAM maybe? Thanks, Mike On 1/25/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > You will likely find this thread interesting... > > > http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-t266441.html > > : 1) Do queries for each sub-category using the results of the first > initial > : query and use the hits count to select the sub-categories to display, > but I > : might have thousands of sub-categories and it would be too slow.. > > The key is not to repeat the query for every sub-cat with an added clause, > it's to do the query once using a HitCollector that generates a BitSet of > all matching results, and then intersect that with BitSets returned by > Filter's (or other HitCollectors) that you've used to get a list of *all* > results in each sub-cat. > > why is this faster? you ask .. because only the initial query changes on > each user search -- the set of all documents in a sub-cat doesn't change > untill new documents are added or deleted, so they can be cached (either > manually, or using CachingWrappingFIlter) ... doing a few thousand BitSet > intersections doesn't take as much time as you think. > > > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >