: for this site, but would you cash all manufacturers and intersect all with
: the initial query in one page load? Seems like that would be alot.

Yep it is a lot, but if you've got the RAM, it's not that time intensive.
At CNET, depending on what page you are looking at, i'm doing
anywhere from 100-1000 intersections.

: So your saying that in a single page load I might be able to do one intitial
: query, and intersect thousands of bitsets in under a second with and an
: index of around 5 million documents, assuming that the server/pc is decent

I can't predict exactly what your performance will be based on your
index size -- and i certainly can't make any promises about it being under
a second total time ... but you should try it, i think it's very feasible.

: speed and enough memory? I still would have to cache thousands of 625k
: bits.. Could I do this with files instead of RAM maybe?

i played with serializing BitSets to disk, it can be done .. but reading
the cached files has a definite added cost.  if you really don't have the
ram to store alll of the bitsets in memory then you're going to want to at
least use an in memory cache for some fixed number of commonly used
categories ... wether that cache falls back to reding from disk on miss,
or falls back to executing the category specific filter/query again
depends on your goals.

you should also keep in mind that if you goal is only to suggest *good*
matching categories and not neccessarily to provide a list of *all*
matching categories or hte *top* matching categories sorted by hits, then
you don't need to cache/intersect all of hte BitSets ... you can try the
biggest ones first, and ocne you've got "enough" "good" matches you can
stop and return what you have so far the the user.

quantifying "enough" and "good" depends (again) on your goals.


: On 1/25/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: >
: >
: > You will likely find this thread interesting...
: >
: >
: > 
http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-t266441.html
: >
: > : 1) Do queries for each sub-category using the results of the first
: > initial
: > : query and use the hits count to select the sub-categories to display,
: > but I
: > : might have thousands of sub-categories and it would be too slow..
: >
: > The key is not to repeat the query for every sub-cat with an added clause,
: > it's to do the query once using a HitCollector that generates a BitSet of
: > all matching results, and then intersect that with BitSets returned by
: > Filter's (or other HitCollectors) that you've used to get a list of *all*
: > results in each sub-cat.
: >
: > why is this faster? you ask .. because only the initial query changes on
: > each user search -- the set of all documents in a sub-cat doesn't change
: > untill new documents are added or deleted, so they can be cached (either
: > manually, or using CachingWrappingFIlter) ... doing a few thousand BitSet
: > intersections doesn't take as much time as you think.
: >
: >
: >
: >
: > -Hoss
: >
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: [EMAIL PROTECTED]
: > For additional commands, e-mail: [EMAIL PROTECTED]
: >
: >
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to