I'd be surprised if the function call overhead was significant, but nonetheless I can't argue with optimizing the sum case. However, it would seem this could be achieved without losing the generality by having DisjunctionScorer.advanceAfterCurrent() call the initialization and accumulation methods, while DisjunctionSumScorer overrides advanceAfterCurrent() to implement its optimization. This seems more natural to me than having the general class be optimized for sum.
<soapbox> I maintain the belief that max is *required* to implement reasonable multi-field searching (1). I can't imagine a case where the current MultiFieldQueryParser actually does the right thing. For users wishing to take a typical query and have it search all fields of their documents, they're going to get horrible results. Maybe they won't notice -- I care about the quality of results, did notice, and was surprised I had to write my own class. After all, Lucene is generally an excellent search engine and it uses a multi-field-based document model (which is a good thing). I would think that good results for multi-field searching out-of-the-box, and therefore built-in support for max, would be viewed as required. It doesn't really matter to me, because I have made it work right, and will be able to make it work right again with the new scheme. It's just that I really like Lucene and am encouraging others to use it. I love the performance and am glad there is such emphasis placed in this area. I'm also happy there is serious attention paid to ensuring the software is easy to specialize or otherwise customize. However, that same kind of care does not seem to carry over to the quality of built-in relevance ranking, nor to the quality and consistency of the scoring model in general. In these areas, I must say Lucene is weak. Based on experience in the commercial enterprise search engine market, this is all too common, and the reason that most internal and site searches produce such horrible results. IT people focus on the performance, scalability and architecture only while the users are screaming that the results are no good. I've seen this pattern many places. </soapbox> Chuck (1) Actually MaxDisjunctionScorer does something a little more refined -- it starts with max and then adds in a specified, presumably small, constant times the sum of the other terms. The max part solves the multi-field problem that is currently in Lucene; i.e., a result matching multiple distinct query terms spread over multiple fields generally gets a higher score than another result matching fewer query terms overall but having the same number of matches in each field. The contribution of the small constant times the sum over the remaining terms allows a result where a term matches in multiple fields to rise above other results matching the same total term set in the same fields but without the multiple matches. > -----Original Message----- > From: Paul Elschot [mailto:[EMAIL PROTECTED] > Sent: Saturday, December 11, 2004 2:05 PM > To: [EMAIL PROTECTED] > Subject: Re: Boolean Scorer > > Chuck, > > On Friday 10 December 2004 23:12, Chuck Williams wrote: > > Paul, > > > > Would there be a way to get the best of both worlds? E.g., could you > > factor the specializable score combination differently, so that one > > method was called with each new score to generate a state entity, > while > > a final method computed the score from the state. For both sum and > max, > > the state entity could just be a float, not requiring an array. The > > final operation for the sum with coord case would do the coord. I > > haven't looked at the code carefully enough to see if this actually > > works, but it seemed worth mentioning. > > It's simple enough to do some abstract method call instead of > initializing > a sum or adding to it. The problem is that as long as such a call is not > effectively inlined by the JVM, it will cause a performance hit for the > sum > case. > > The latest version of the advanceAfterCurrent method that computes the > score is java protected. It can be overridden to make the best > of it in another world. > > Regards, > Paul Elschot > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]