Hello,

Since there are a lot of Term objects in your Query, your application must
spend a lot of time collecting information about those Terms.

1/ Do you use RAMDirectory? Loading the whole Directory into memory will
increase speed - your index must not be too big though

2/ You are probably not using the QueryParser - so when you are building the
Query you could sort the Term objects inside a BooleanQuery. Sorting the
Terms will reduce jumps on disk. I have no benchmarks for this, but
logically, it should have some positive effect when using FSDirectory. Am I
wrong?

3/ There was a patch submitted by Dmitry Serebrennikov
(http://www.mail-archive.com/[EMAIL PROTECTED]/msg02762.html)
which reduced garbage collecting by limiting the creation of temporary Term
objects. This patch has not been included in Lucene code (a bug in it?).

Hope it helps.

Julien

----- Original Message -----
From: "Jie Yang" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, November 12, 2003 10:11 PM
Subject: Poor Performance when searching for 500+ terms


> I know this is rare, But I am building an application
> that submits searches having 500+ search terms. A
> general example would be
>
> field1:w1 OR field1:w2 OR ... OR field1:w500
>
> For 1 millions documents, the performance is OK if
> field1 in each document has less than 50 terms, I can
> get result < 1 sec. but if field1 has more than
> average 400 terms in each document, the performance
> degrades to around 6 secs.
>
> Is there anyway to improve this?
>
> And my second questions is that my query often comes
> with an AND condition with another search word. for
> example:
>
> field2:w AND (field1:w1 OR field1:w2, ... field1:w500)
>
> field2:w will only return less than 1000 records out
> of 1 millions. then I thought I could use a
> StringFilter Object? i.e. search on field2.w first,
> thus limit the search for 500 OR only on the field2.w
> 1000 results. somewhat like a join in database. But I
> checked the code and sees that IndexSearcher always
> perfomance the 500 disk searches before calling the
> filter object? Any suggestions on this?
>
> Also does lucene caches results in memory? I see the
> performance tends to get better after a few runs,
> especailly on searches on fields having small number
> of terms. If so, can I manipulate the cache size
> somehow to accommdate fields with large number of
> terms.
>
> Many thanks.
>
>
> ________________________________________________________________________
> Want to chat instantly with your online friends?  Get the FREE Yahoo!
> Messenger http://mail.messenger.yahoo.co.uk
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to