[ https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848896#action_12848896 ]
Shai Erera commented on LUCENE-2215: ------------------------------------ I've reviewed PagingCollector.java and the first thing I have to say about it is that I really like it ! :) Saves lots of unnecessary heapify code, if the application can allow itself to store the lowest last SD. I have few comments/questions. I don't understand what getLastScoreDoc is for? Is it just a utility method? Is it something the app can compute by itself? Anyway, it lacks javadocs, so perhaps if they existed I wouldn't need to ask ;). In collect(), there's the following code: {code} } else if (score == previousPassLowest.score && doc <= previousPassLowest.doc) { // if the scores are the same and the doc is less than or equal to // the // previous pass lowest hit doc then skip because this collector // favors // lower number documents. return; {code} I think there's a typo in the comment "favors lower number documents" .. while it seems to prefer higher doc IDs? The way I understand it, irregardless of whether docs are collected in/out of order, HitQueue ensures that when scores are equals, the lowest IDs are favored. Thus the first round always keeps the lowest IDs among the docs whose scores match. The next round will favor the docs whose IDs come next, and so forth ... am I right? (just clarifying my understanding). If that's the case, I think it'll be good if it's spelled out in the comment, and also mention that it means that document has already been returned previously (like it's documented in the previous 'if'). The last 'else' really looks like TSDC's out-of-order version, which makes me think whether PagingCollector can be viewed as a filter on top of TSDC (and possibly even TopFieldCollector)? So if a hit should be collected, it just calls super.collect? I realize though that a Collector is a hotspot and we want to minimize 'if' let alone method call statements as much as possible. But it just feels so strong that it should be a filter ... :). And you wouldn't need to specifically handle in/out orderness ... and w/ the right design, it can also wrap a TFC or any other TDC implementation ... BTW, I've noticed that you don't track maxScore - is it assumed that the application stores it from the first round? If so I'd document it, because the application needs to know it should use TSDC the first round, and PagingCollector the second round. Also, PagingCollector offers a ctor which does not force the application to pass in a ScoreDoc. See my comment from above - it might be misleading, because if you use this collector right from the very first search, you lose the maxScore tracking. I also don't see why it should be allowed - if a dummy previousPassLowest ScoreDoc is used, collect() does a lot of unnecessary 'if's. I think this collector should be used only from the second round, and a single ctor which forces a ScoreDoc to be passed would make more sense. If the application wishes to shoot itself in the leg (performance-wise), it can pass a dummy SD itself. > paging collector > ---------------- > > Key: LUCENE-2215 > URL: https://issues.apache.org/jira/browse/LUCENE-2215 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Affects Versions: 2.4, 3.0 > Reporter: Adam Heinz > Assignee: Grant Ingersoll > Priority: Minor > Attachments: IterablePaging.java, PagingCollector.java, > TestingPagingCollector.java > > > http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898 > Somebody assign this to Aaron McCurry and we'll see if we can get enough > votes on this issue to convince him to upload his patch. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org