Paul, Thanks for the feedback and suggestions. You are of course correct that the implementation I've chosen will have poor performance if there are a large number of subqueries in a MaxDisjunctionQuery. However, this is not the case in my usage, nor in the intended primary usage in general. When using MaxDisjunctionQuery as a technique for searching terms across multiple fields there will never be more subqueries than there are distinct fields across which you want to search a single term. Especially when combined with the technique of concatenating equally important fields into larger search fields (so that only the reduced set of search fields need be searched) this number is never more than a handful.
I should probably at least add to the comment the fact that the implementation is optimized for small numbers of subqueries. It's an interesting question how the performance of MaxDisjunctionQuery compares to that of BooleanQuery as the number of subqueries varies. My guess is that MaxDisjunctionQuery is faster for small numbers of subqueries, but that for larger numbers BooleanQuery gets faster, possibly much faster for very large numbers of subqueries (depending on the distribution of documents beings queried). If I have a chance, I'll run some comparative timings out of curiosity. Did you see my IDF question at the bottom of the original note? I'm really curious why the square of IDF is used for Term and Phrase queries, rather than just IDF. It seems like it might be a bug? Chuck > -----Original Message----- > From: Paul Elschot [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 12, 2004 11:04 AM > To: Lucene Developers List > Subject: Re: Contribution: better multi-field searching > > Chuck, > > The scorer keeps a sorted array of subscorers and sorts it > whenever needed. It's somewhat easier to implement that > with a util.PriorityQueue, but can't say whether it would be > faster. > > For a definitely faster implementation one can start from > Lucene's BooleanScorer and assume all clauses > are optional. Instead of summing just use the maximum. > > BooleanScorer works ahead for each scorer to avoid > the need for keeping the scorers sorted. > But you'll probably loose skipTo() when using BooleanScorer. > > Regards, > Paul Elschot. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]