[ 
https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703362#action_12703362
 ] 

Shai Erera commented on LUCENE-1593:
------------------------------------

bq. I wonder if, instead, we could define an Object getSentinelObject(), 
returning null by default, and the pqueue on creation would call that and if 
it's non-null, fill the queue (by calling it maxSize times)?

Some extensions of PQ may not know how to construct such a sentinel object. 
Consider ComparablePQ, which assumes all items are Comparable. Unlike HitQueue, 
it does not know what will be the items stored in the queue. But ... I guess it 
can return a Comparable which always prefers the other element ... so maybe 
that's not a good example.
I just have the feeling that a setter method will give us more freedom, rather 
than having to extend PQ just for that ... 

bq. Actually.... shouldn't Weight.scorer(...) in general be the place where 
such "pre-next() initializatoin" is done?

Ok - so BS's add() is only called from BS2.score(Collector). Therefore BS can 
be initialized from BS2 directly. Since both are package-private, we should be 
safe.
BS2's add() is called from BooleanWeight.scorer() (I'm sorry if I repeat what 
you wrote above, but that's just me learning the code), and so can be 
initialized from there ... hmm I wonder why this wasn't done so far?

I'll give it a try.

bq. That basically undoes the "don't fallback to docID" optimization right?

Right ... it's too late for me :)

bq. I guess since the 6 classes are hidden under the TopFieldCollector.create 
it's maybe not so bad?

It's just that maintaining that class becomes more and more problematic. It 
already contains 6 inner classes, which duplicate the code to avoid 'if' 
statements. Meaning every time a bug is found, all 6 need to be checked and 
fixed. With that proposal, it means 12 ...

But I wonder from where would we control it ... IndexSearcher no longer has a 
ctor which allows to define whether docs should be collected in order or not 
(the patch removes it). The only other place where it's defined is in 
BooleanQuery's static setter (affects all boolean queries). But BooleanWeight 
receives the Collector, and does not create it ...
So, if we check in IndexSearcher's search() methods whether this parameter is 
set or not, we can control the creation of TSDC and TFC. And if anyone else 
instantiates them on his own, he should know whether he executes searches 
in-order or not. Back-compat-wise, TFC and TSDC are still in trunk and haven't 
been released, so we shouldn't have a problem right?

Does that sound like a good approach? I still hate to duplicate the code in 
TFC, but I don't think there's any other choice. Maybe create completely 
separate classes for TFC and TSDC? although that will make code maintenance 
even harder.

> Optimizations to TopScoreDocCollector and TopFieldCollector
> -----------------------------------------------------------
>
>                 Key: LUCENE-1593
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1593
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1593.patch, PerfTest.java
>
>
> This is a spin-off of LUCENE-1575 and proposes to optimize TSDC and TFC code 
> to remove unnecessary checks. The plan is:
> # Ensure that IndexSearcher returns segements in increasing doc Id order, 
> instead of numDocs().
> # Change TSDC and TFC's code to not use the doc id as a tie breaker. New docs 
> will always have larger ids and therefore cannot compete.
> # Pre-populate HitQueue with sentinel values in TSDC (score = Float.NEG_INF) 
> and remove the check if reusableSD == null.
> # Also move to use "changing top" and then call adjustTop(), in case we 
> update the queue.
> # some methods in Sort explicitly add SortField.FIELD_DOC as a "tie breaker" 
> for the last SortField. But, doing so should not be necessary (since we 
> already break ties by docID), and is in fact less efficient (once the above 
> optimization is in).
> # Investigate PQ - can we deprecate insert() and have only 
> insertWithOverflow()? Add a addDummyObjects method which will populate the 
> queue without "arranging" it, just store the objects in the array (this can 
> be used to pre-populate sentinel values)?
> I will post a patch as well as some perf measurements as soon as I have them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to