Re: multiple collections indexing

Morus Walter Fri, 21 Mar 2003 05:12:46 -0800

Hi,
> 
> Are lots of different combinations of collections used frequently? 
> Probably not.  If only a handful of different subsets of collections are 
> frequently searched, then QueryFilter could be very useful.
> 
I did some test and thought the results might be interesting for others
also.


I ran a number of queries for different combinations of collections
from single collections up to 33 collections in a query.

Collections were selected by 
- OR combined queries on the collection id
- a QueryFilter
- a cached QueryFilter
- MultiSearch

In order to minimize effects like file system caching, I run all queries
three times and counted the total search time for each method of
collection selection for all passes and for the second and third pass
only (assuming that caching effects will show up in the first pass only).

Further I did a test, where the different methods were used on the same
query one after another and another test, where I did all tests for
one method as a block.
The time measured contains only query preparation and search.
I did not access any results. This should be independent from
the query anyway.

Basically I found that using the OR combined queries is worst.
It takes nearly twice as much time as creating a filter for every request.
Caching of filters is a good thing, it saves a lot of time.
Using a multisearch is also much faster than creating a filter for
every request. So it depends on the expected cache hit rate, which one
is better.

For the first test (different methods one after another for the same
queries) I get
filter:          3810 (1591) avg: 1270 (795)
filter (cached): 571 (77) avg: 190 (38)
or combined:     6706 (2946) avg: 2235 (1473)
multi      :     872 (249) avg: 290 (124)

(times in ms, first number is the time for all three passes, the second
number in parenthesis is the time excluding pass 1; the average times
are time per pass; one pass contained 24 queries (4 different queries
and 6 different combinations of collections))

For the second test the times are slightly different but the
picture is the same:
filter:          3447 (1675) avg: 1149 (837)
filter (cached): 590 (107) avg: 196 (53)
or combined:     6611 (2746) avg: 2203 (1373)
multi      :     504 (228) avg: 168 (114)

The test was done on a 700 MHz Intel P3 box running Linux (RedHat 8.0, 
kernel 2.4.18) with 1 GB Ram.
The combined index is ~750 MB, the separate indexes size is between 0.5
and 100 MB.

greetings
        Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: multiple collections indexing

Reply via email to