Valentin Popov <valentin...@gmail.com> wrote:

> We have ~10 indexes for 500M documents, each document
> has «archive date», and «to» address, one of our task is
> calculate statistics of «to» for last year. Right now we are
> using search archive_date:(current_date - 1 year) and paginate
> results for 50k records for page. Bottleneck of that approach,
> pagination take too long time and on powerful server it take 
>~20 days to execute, and it is very long.

Lucene does not like deep page requests due to the way the internal Priority 
Queue works. Solr has CursorMark, which should be fairly simple to emulate in 
your Lucene handling code:

http://lucidworks.com/blog/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

- Toke Eskildsen

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to