Re: 500 millions document for loop.

Valentin Popov Thu, 12 Nov 2015 09:48:12 -0800

Toke, thanks! 

We will look at this solution, looks like this is that what we need.



> On 12 нояб. 2015 г., at 20:42, Toke Eskildsen <t...@statsbiblioteket.dk> 
> wrote:
> 
> Valentin Popov <valentin...@gmail.com> wrote:
> 
>> We have ~10 indexes for 500M documents, each document
>> has «archive date», and «to» address, one of our task is
>> calculate statistics of «to» for last year. Right now we are
>> using search archive_date:(current_date - 1 year) and paginate
>> results for 50k records for page. Bottleneck of that approach,
>> pagination take too long time and on powerful server it take 
>> ~20 days to execute, and it is very long.
> 
> Lucene does not like deep page requests due to the way the internal Priority 
> Queue works. Solr has CursorMark, which should be fairly simple to emulate in 
> your Lucene handling code:
> 
> http://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
> 
> - Toke Eskildsen
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 

Regards,
Valentin Popov





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: 500 millions document for loop.

Reply via email to