Re: 500 millions document for loop.

Valentin Popov Thu, 12 Nov 2015 10:00:55 -0800

Toke, I just look throw code we already using such method 

IndexSearcher indexSearcher = getIndexSearcher(searchResult);
                        
                        TopDocs topDocs;
                        ScoreDoc currectScoreDoc = p.startScoreDoc;
                        for (int page = 1; page < pages - 1; page++) {
                        topDocs = indexSearcher.searchAfter(currectScoreDoc, 
query, queryFilter, searchResult.getPageSize(), sort);
                        int endpos = topDocs.scoreDocs.length - 1;
                        if (endpos > 0) {
                                startIdx += topDocs.scoreDocs.length;
                                currectScoreDoc = topDocs.scoreDocs[endpos];
                                searchResult.setPage(currectScoreDoc, startIdx);
                        }
                        
                        topDocs = null;
                        
                        if (searchResult.getCancelled()) {
                                return searchResult;
                        }
                        
                        }



> On 12 нояб. 2015 г., at 20:42, Toke Eskildsen <t...@statsbiblioteket.dk> 
> wrote:
> 
> Valentin Popov <valentin...@gmail.com> wrote:
> 
>> We have ~10 indexes for 500M documents, each document
>> has «archive date», and «to» address, one of our task is
>> calculate statistics of «to» for last year. Right now we are
>> using search archive_date:(current_date - 1 year) and paginate
>> results for 50k records for page. Bottleneck of that approach,
>> pagination take too long time and on powerful server it take 
>> ~20 days to execute, and it is very long.
> 
> Lucene does not like deep page requests due to the way the internal Priority 
> Queue works. Solr has CursorMark, which should be fairly simple to emulate in 
> your Lucene handling code:
> 
> http://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
> 
> - Toke Eskildsen
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 

Regards,
Valentin Popov





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: 500 millions document for loop.

Reply via email to