Re: how do I paginate Lucene search results deeply

Toke Eskildsen Thu, 14 Mar 2013 03:04:07 -0700

On Thu, 2013-03-14 at 04:11 +0100, dizh wrote:
> each document has a timestamp identify the time which it is indexed, I
> want search the documents using sort, the sort field is the timestamp,


[...]

> but when you do paging, for example in a web app , the user want to go
> to the last 49999980-5000000, well, it is slowly...

Yes. The problen is that it performs a sliding window search with a
window size of page+topX and that does not work well with 5M entries,
especially not as it used a heap, which work very well for small windows
but horrible for large windows.

> I have a large number of Log4J logs, and I want to index them and
> present them using web ui. 

I still don't see why you would want to page to 5M, but okay.

Instead of representing the timestamps directly, convert them to unique
longs when indexing. Guessing that you always have less than 1000 log
entries/ms, your long would be 
  (timestamp_in_ms << 10) & counter++
where the counter is set to 0 each time a different timestamp is
encountered. This also ensures that the order of your log entries is
preserved. Let's call the modified timestamps for utime.

When you do a paginated search for 20 results, keep track of the last
utime. When you request the next page, add a NumericRangeFilter going
from the last utime (non-inclusive) with no upper limit and ask for the
top-20 results again


NB: Please get rid of the garbage that follows each of your posts on
this mail list. The Confidentiality Notice has negative value here.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: how do I paginate Lucene search results deeply

Reply via email to