RE: What is the right way for getting paged results with Lucene?

Michael Garski Wed, 20 Jan 2010 11:50:56 -0800

We handle paging through large result sets by executing a query for the maximum 
number of results, then cache the result set and perform the pagination through 
cache without re-execution of the query.

Michael

-----Original Message-----
From: Moray McConnachie [mailto:[email protected]] 
Sent: Wednesday, January 20, 2010 1:20 AM
To: [email protected]
Subject: Re: What is the right way for getting paged results with Lucene?

We use a similar method for paging, i.e. we retrieve (pageNo*pageSize) 
documents and display the bottom pageSize.

Unless you added keeping state to searches, you will always need the search to 
recalculate all results above your current page in order to determine where 
your current page is. 

Depending on your sort order you could modify your query for subsequent pages 
using a range query.

But I would not expect this to be significantly faster necessarily. At least in 
our app the extra overhead of last page in 100 000 hits vs. first page is less 
than one tenth of a second. Simone is right that in many perhaps most 
applications only the first few pages are normally accessed. 

------------------
Moray McConnachie
Director of IT,
Oxford Analytica

-----Original Message-----
From: Simone Chiaretta <[email protected]>
Date: Wed, 20 Jan 2010 09:28:39 
To: <[email protected]>
Subject: Re: What is the right way for getting paged results with Lucene?

Actually I think this is a false problem:
how many times do you go at page 3 of Google? I never go: if I don't
find something useful I just change keywords.
Depending on how many results you have on a page, the last results
might not even be relevant: I've noticed that after a while the score
of docs drops drastically: I filter out docs with a normalized score
lower then 0.2, so I rarely have more than 100 results

I know I didn't answer your question, actually there is no way afaik
to get paged results, but this how I dealt with "paging"

simo

On Wednesday, January 20, 2010, Markus Wolters <[email protected]> wrote:
> Hello,
>
> as you might know, I am pretty new to Lucene and integrating 2.9.1 right now
> into my current ASP.NET MVC project. And it's really working like a charm.
> (Thanks to Michael)
>
> I am curious about if I've understood it right, how to do paging with
> Lucene. I've implemented it like so:
>
>         IndexSearcher searcher = Searcher;
>
>       // Collect all resulting documents until selected page
>         TopScoreDocCollector collector =
> TopScoreDocCollector.create((pageIndex + 1) * pageSize, false);
>       searcher.Search(query, collector);
>
>         // Get documents for selected page
>       TopDocs hits = collector.TopDocs(pageIndex * pageSize, pageSize);
>
> So in case if someone selects one of the last pages of a huge result, Lucene
> would go over a lot results, even that I just need 'pageSize' results, is
> that right? What about the performance or memory usage? I took a sneak peek
> into the Searcher code and believe to have seen that Lucene is creating a
> que as big as documents to get. So in case of a totalhit-count of let's say
> 20000, a pageSize of 20 and selecting the last page (999), even that I
> actually need just the 20 last documents, Lucene is getting (and even
> allocating mememory for?) all 20000 resulting documents. Is that right?
>
> In other terms, I want to do a MySQL-equivalent to SELECT [...] LIMIT
> pageIndex * pageSize, pageSize.
>
> Markus
>
>
>

-- 
Simone Chiaretta
Microsoft MVP ASP.NET - ASPInsider
Blog: http://codeclimber.net.nz
RSS: http://feeds2.feedburner.com/codeclimber
twitter: @simonech

Any sufficiently advanced technology is indistinguishable from magic
"Life is short, play hard"

RE: What is the right way for getting paged results with Lucene?

Reply via email to