[ https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704198#action_12704198 ]
Michael McCandless commented on LUCENE-1593: -------------------------------------------- bq. So I'm now convinced this breaks back-compat. Woops, yes it does. Grr. The thing is... I'm not sure we can make such a change even in 3.0. Ie, all that's "special" about 3.0 is we get to remove deprecated APIs, and begin using Java 1.5 language features. I'm not sure if a sudden change in runtime behavior ("you must call Scorer.init() before calling next or skipTo") is allowed. Maybe we could make a Weight.initializableScorer, that returns a Scorer that requires init() be first called. But since Weight is an interface, we can't change it. So maybe we can make a new abstract class called AbstractWeight (for lack of a better name), implementing Weight. We would deprecate Weight (and remove it at 3.0). We can make a new "get me a Scorer" API in AbstractWeight, eg, require that Scorers returned from there must have "init" called first, pass in an "isTopScorer" boolean, etc. Query would have a "abstractWeight()" method, emulated by wrapping the "weight()" method. Could something crazy like this work....? Maybe we should break out the two goals: this [new] goal is simply to migrate away from Weight as interfaace to AbstractWeight as abstract class, then step 2 is to make the optimizations we are discussing here. This is like running in a potato sack race! > Optimizations to TopScoreDocCollector and TopFieldCollector > ----------------------------------------------------------- > > Key: LUCENE-1593 > URL: https://issues.apache.org/jira/browse/LUCENE-1593 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Shai Erera > Fix For: 2.9 > > Attachments: LUCENE-1593.patch, PerfTest.java > > > This is a spin-off of LUCENE-1575 and proposes to optimize TSDC and TFC code > to remove unnecessary checks. The plan is: > # Ensure that IndexSearcher returns segements in increasing doc Id order, > instead of numDocs(). > # Change TSDC and TFC's code to not use the doc id as a tie breaker. New docs > will always have larger ids and therefore cannot compete. > # Pre-populate HitQueue with sentinel values in TSDC (score = Float.NEG_INF) > and remove the check if reusableSD == null. > # Also move to use "changing top" and then call adjustTop(), in case we > update the queue. > # some methods in Sort explicitly add SortField.FIELD_DOC as a "tie breaker" > for the last SortField. But, doing so should not be necessary (since we > already break ties by docID), and is in fact less efficient (once the above > optimization is in). > # Investigate PQ - can we deprecate insert() and have only > insertWithOverflow()? Add a addDummyObjects method which will populate the > queue without "arranging" it, just store the objects in the array (this can > be used to pre-populate sentinel values)? > I will post a patch as well as some perf measurements as soon as I have them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org