[ https://issues.apache.org/jira/browse/LUCENE-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Ingersoll updated LUCENE-2127: ------------------------------------ Attachment: LUCENE-2127.patch OK, I think this has some legs, assuming I did everything right (especially the benchmarker stuff). Here's what I did: 1. Added postCollect() method to Collector as an empty method 2. Hooked it into IndexSearcher, MultiSearcher and ParallelMultiSearcher. I'm not sure I have all of the search paths covered yet, but... 3. Hooked in the ability to specify the collector in benchmarker (see collector.alg) 4. Added a new LongToEnglishContentSource and QueryMaker to create pretty much infinitely scalable number of docs based off the English.java test util. Prelim results (unvalidated) retrieving up to 1M records (out of 2M): {quote} ------------> Report sum by Prefix (SearchCollector) and Round (4 about 4 out of 8000034) Operation round coll runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem SearchCollector_10 0org.apache.lucene.search.PostCollectSortCollector 1 10 0.14 73.32 290,371,776 386,625,536 SearchCollector_10 - 1topDocOrdered - - 1 - - - 10 - - - 0.10 - - 98.37 - 449,582,048 - 588,189,696 SearchCollector_10 2org.apache.lucene.search.PostCollectSortCollector 1 10 0.14 71.47 964,864,512 1,016,311,808 SearchCollector_10 - 3topDocOrdered - - 1 - - - 10 - - - 0.10 - - 98.73 - 791,313,664 1,016,311,808 {quote} Still lots to do, but wanted to put it up for people to look at and tell me what I'm doing wrong. I'd also love to hook in FieldComparator stuff. Even if we could have a wrapper that took in FieldComparator inside of a regular Comparator would be cool. > Improved large result handling > ------------------------------ > > Key: LUCENE-2127 > URL: https://issues.apache.org/jira/browse/LUCENE-2127 > Project: Lucene - Java > Issue Type: New Feature > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Attachments: LUCENE-2127.patch > > > Per > http://search.lucidimagination.com/search/document/350c54fc90d257ed/lots_of_results#fbb84bd297d15dd5, > it would be nice to offer some other Collectors that are better at handling > really large number of results. This could be implemented in a variety of > ways via Collectors. For instance, we could have a raw collector that does > no sorting and just returns the ScoreDocs, or we could do as Mike suggests > and have Collectors that have heuristics about memory tradeoffs and only > heapify when appropriate. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org