I am sorry,
but after applying this patch, the performance on my tests are worse than
those on lucene-2.9-dev trunk.
TEST1: using *filter.getDocIdSet(reader)*;
*Test *results* (Num docs = 2,940,738) using lucene-core-2.9-dev trunk**
1 Original index (12 collections * 6 months = 72 indexes)*
1a Range [20090101000000 - 20090131235959] --> 379,560 docs
2,274 ms 1,477 ms 1,283 ms
1b Range [20081201000000 - 20090131235959] --> 974,754 docs
4,489 ms 3,333 ms 3,390 ms
1c Range [20081001000000 - 20090131235959] --> 2,197,590 docs
8,482 ms 7,471 ms 7,424 ms
*2Consolidated index (1 index)*
2a Range [20090101000000 - 20090131235959] --> 379,560 docs
492 ms 116 ms 83 ms
2b Range [20081201000000 - 20090131235959] --> 974,754 docs
640 ms 159 ms 138 ms
2c Range [20081001000000 - 20090131235959] --> 2,197,590 docs
817 ms 322 ms 295 ms
*Test *results* (Num docs = 2,940,738) using lucene-core-2.9-dev
trunk**+ patch 1596
1 Original index (12 collections * 6 months = 72 indexes)*
1a Range [20090101000000 - 20090131235959] --> 379,560 docs
3,699 ms 3,347 ms 1,368 ms
1b Range [20081201000000 - 20090131235959] --> 974,754 docs
6,508 ms 4,540 ms 6,151 ms
1c Range [20081001000000 - 20090131235959] --> 2,197,590 docs
15,941 ms 10,440 ms 13,622 ms
*2Consolidated index (1 index)*
2a Range [20090101000000 - 20090131235959] --> 379,560 docs
514 ms 70 ms 63 ms
2b Range [20081201000000 - 20090131235959] --> 974,754 docs
708 ms 165 ms 137 ms
2c Range [20081001000000 - 20090131235959] --> 2,197,590 docs
782 ms 430 ms 602 ms
TEST2: using *searcher.search(query, filter, 10);*
*Test *results* (Num docs = 2,940,738) using lucene-core-2.9-dev trunk
1 Original index (12 collections * 6 months = 72 indexes)
*
1a Range [20090101000000 - 20090131235959] --> 379,560 docs
1,187 ms 273 ms 416 ms
1b Range [20081201000000 - 20090131235959] --> 974,754 docs
1,539 ms 764 ms 571 ms
1c Range [20081001000000 - 20090131235959] --> 2,197,590 docs
2,235 ms 1,503 ms 1,260 ms
*2 Consolidated index (1 index)*
2a Range [20090101000000 - 20090131235959] --> 379,560 docs
385 ms 85 ms 73 ms
2b Range [20081201000000 - 20090131235959] --> 974,754 docs
490 ms 208 ms 196 ms
2c Range [20081001000000 - 20090131235959] --> 2,197,590 docs
707 ms 361 ms 317 ms
*Test *results* (Num docs = 2,940,738) using lucene-core-2.9-dev
trunk**+ patch 1596
**
1 Original index (12 collections * 6 months = 72 indexes)
*
1a Range [20090101000000 - 20090131235959] --> 379,560 docs
1,181 ms 375 ms 237 ms
1b Range [20081201000000 - 20090131235959] --> 974,754 docs
1,670 ms 749 ms 550 ms
1c Range [20081001000000 - 20090131235959] --> 2,197,590 docs
3,379 ms 2,409 ms 2,470 ms
*2 Consolidated index (1 index)*
2a Range [20090101000000 - 20090131235959] --> 379,560 docs
444 ms 72 ms 72 ms
2b Range [20081201000000 - 20090131235959] --> 974,754 docs
576 ms 208 ms 140 ms
2c Range [20081001000000 - 20090131235959] --> 2,197,590 docs
907 ms 484 ms 373 ms
Raf
On Sat, Apr 11, 2009 at 11:21 PM, Yonik Seeley
<[email protected]>wrote:
> OK, I think this will improve the situation:
> https://issues.apache.org/jira/browse/LUCENE-1596
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Fri, Apr 10, 2009 at 1:47 PM, Michael McCandless
> <[email protected]> wrote:
> > We never fully explained it, but we have some ideas...
> >
> > It's only if you iterate each term, and do a TermDocs.seek for each,
> > that Multi*Reader seems to show the problem. Just iterating the terms
> > seems OK (I have a 51 segment index, and I can iterate ~ 10M unique
> > terms in ~8 seconds).
> >
> > But loading FieldCache, or doing eg RangeQuery, also does a
> > MultiTermDocs.seek on each term, which in turn calls
> > SegmentTermDocs.seek for each of the sub-readers in sequence. I
> > *think* maybe for highly unique terms, where typically all segments
> > but one actually have the term, the cost of invoking seek on those
> > segments without the term is high. Really, somehow, we want to only
> > call seek on those segments that have the term, which we know from the
> > pqueue...
> >
> > Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>