[jira] Updated: (LUCENE-2127) Improved large result handling

Grant Ingersoll (JIRA) Wed, 06 Jan 2010 12:04:25 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Grant Ingersoll updated LUCENE-2127:
------------------------------------

    Attachment: LUCENE-2127.patch

OK, I think this has some legs, assuming I did everything right (especially the 
benchmarker stuff).

Here's what I did:
1.  Added postCollect() method to Collector as an empty method
2. Hooked it into IndexSearcher, MultiSearcher and ParallelMultiSearcher.  I'm 
not sure I have all of the search paths covered yet, but...
3. Hooked in the ability to specify the collector in benchmarker (see 
collector.alg)
4. Added a new LongToEnglishContentSource and QueryMaker to create pretty much 
infinitely scalable number of docs based off the English.java test util.

Prelim results (unvalidated) retrieving up to 1M records (out of 2M):
{quote}
------------> Report sum by Prefix (SearchCollector) and Round (4 about 4 out 
of 8000034)
Operation          round coll   runCnt   recsPerRun        rec/s  elapsedSec    
avgUsedMem    avgTotalMem
SearchCollector_10     0org.apache.lucene.search.PostCollectSortCollector       
 1           10         0.14       73.32   290,371,776    386,625,536
SearchCollector_10 -   1topDocOrdered -  -   1 -  -  -   10 -  -  - 0.10 -  -  
98.37 - 449,582,048 -  588,189,696
SearchCollector_10     2org.apache.lucene.search.PostCollectSortCollector       
 1           10         0.14       71.47   964,864,512  1,016,311,808
SearchCollector_10 -   3topDocOrdered -  -   1 -  -  -   10 -  -  - 0.10 -  -  
98.73 - 791,313,664  1,016,311,808
{quote}

Still lots to do, but wanted to put it up for people to look at and tell me 
what I'm doing wrong.  I'd also love to hook in FieldComparator stuff.  Even if 
we could have a wrapper that took in FieldComparator inside of a regular 
Comparator would be cool.


> Improved large result handling
> ------------------------------
>
>                 Key: LUCENE-2127
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2127
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: LUCENE-2127.patch
>
>
> Per 
> http://search.lucidimagination.com/search/document/350c54fc90d257ed/lots_of_results#fbb84bd297d15dd5,
>  it would be nice to offer some other Collectors that are better at handling 
> really large number of results.  This could be implemented in a variety of 
> ways via Collectors.  For instance, we could have a raw collector that does 
> no sorting and just returns the ScoreDocs, or we could do as Mike suggests 
> and have Collectors that have heuristics about memory tradeoffs and only 
> heapify when appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2127) Improved large result handling

Reply via email to