[jira] Commented: (LUCENE-2127) Improved large result handling

Michael McCandless (JIRA) Wed, 06 Jan 2010 04:35:28 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797095#action_12797095
 ]


Michael McCandless commented on LUCENE-2127:
--------------------------------------------

I think you should be able to use FieldComparator for the "large queue".

If your "large queue" is a list/array, then it has slots, and you just 
reference those slots when asking FieldComparator to compare.

The only difference from the FieldComparator's standpoint is you are doing far 
fewer compare calls as-you-go, then periodically you have a big burst of 
compares (the "selection algorithm", which should be implementable on 
FieldComparator) to find a new rough bottom.

But... have you thought about theoretical cost of "true pqueue" vs "approximate 
pqueue"?  I think in the worst case, where results are returned in precisely 
reverse sort order, so that you always fully turnover the queue, is tricky.

Say your queue size is 10K, but you tolerate say 20 K until pruning.  With 
worst case, every 10K hits you will have to re-select to dump the 10K worst.  
So cost is O(N * M), where M = queue size and N = total number of hits, though 
with a smallish 1/10K constant.

With true pqueue, every hit pays log(M) price, so total cost is O(N * log(M)).

But actually that 1/10K constant = 1/M, and so I think the approximate PQ works 
out to O(N) cost, which is in fact much cheaper.  I think?

> Improved large result handling
> ------------------------------
>
>                 Key: LUCENE-2127
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2127
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>
> Per 
> http://search.lucidimagination.com/search/document/350c54fc90d257ed/lots_of_results#fbb84bd297d15dd5,
>  it would be nice to offer some other Collectors that are better at handling 
> really large number of results.  This could be implemented in a variety of 
> ways via Collectors.  For instance, we could have a raw collector that does 
> no sorting and just returns the ScoreDocs, or we could do as Mike suggests 
> and have Collectors that have heuristics about memory tradeoffs and only 
> heapify when appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2127) Improved large result handling

Reply via email to