[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848896#action_12848896
 ] 

Shai Erera commented on LUCENE-2215:
------------------------------------

I've reviewed PagingCollector.java and the first thing I have to say about it 
is that I really like it ! :) Saves lots of unnecessary heapify code, if the 
application can allow itself to store the lowest last SD.

I have few comments/questions.

I don't understand what getLastScoreDoc is for? Is it just a utility method? Is 
it something the app can compute by itself? Anyway, it lacks javadocs, so 
perhaps if they existed I wouldn't need to ask ;).

In collect(), there's the following code:
{code}
                } else if (score == previousPassLowest.score && doc <= 
previousPassLowest.doc) {
                        // if the scores are the same and the doc is less than 
or equal to
                        // the
                        // previous pass lowest hit doc then skip because this 
collector
                        // favors
                        // lower number documents.
                        return;
{code}

I think there's a typo in the comment "favors lower number documents" .. while 
it seems to prefer higher doc IDs? The way I understand it, irregardless of 
whether docs are collected in/out of order, HitQueue ensures that when scores 
are equals, the lowest IDs are favored. Thus the first round always keeps the 
lowest IDs among the docs whose scores match. The next round will favor the 
docs whose IDs come next, and so forth ... am I right? (just clarifying my 
understanding).
If that's the case, I think it'll be good if it's spelled out in the comment, 
and also mention that it means that document has already been returned 
previously (like it's documented in the previous 'if').

The last 'else' really looks like TSDC's out-of-order version, which makes me 
think whether PagingCollector can be viewed as a filter on top of TSDC (and 
possibly even TopFieldCollector)? So if a hit should be collected, it just 
calls super.collect? I realize though that a Collector is a hotspot and we want 
to minimize 'if' let alone method call statements as much as possible. But it 
just feels so strong that it should be a filter ... :). And you wouldn't need 
to specifically handle in/out orderness ... and w/ the right design, it can 
also wrap a TFC or any other TDC implementation ...

BTW, I've noticed that you don't track maxScore - is it assumed that the 
application stores it from the first round? If so I'd document it, because the 
application needs to know it should use TSDC the first round, and 
PagingCollector the second round.

Also, PagingCollector offers a ctor which does not force the application to 
pass in a ScoreDoc. See my comment from above - it might be misleading, because 
if you use this collector right from the very first search, you lose the 
maxScore tracking. I also don't see why it should be allowed - if a dummy 
previousPassLowest ScoreDoc is used, collect() does a lot of unnecessary 'if's. 
I think this collector should be used only from the second round, and a single 
ctor which forces a ScoreDoc to be passed would make more sense. If the 
application wishes to shoot itself in the leg (performance-wise), it can pass a 
dummy SD itself.

> paging collector
> ----------------
>
>                 Key: LUCENE-2215
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2215
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: 2.4, 3.0
>            Reporter: Adam Heinz
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: IterablePaging.java, PagingCollector.java, 
> TestingPagingCollector.java
>
>
> http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
> Somebody assign this to Aaron McCurry and we'll see if we can get enough 
> votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to