RE: What is the right way for getting paged results with Lucene?

Moray McConnachie Thu, 21 Jan 2010 02:07:08 -0800

We have just introduced results caching (outside Lucene) for frequently 
executed queries with good results.


Depending (of course) on your load and profile of searches, it may be more 
efficient to cache the first few pages of results for more queries than the 
full result set for a smaller number of queries. 

It would be interesting to run some numbers on this, although I don't see that 
they would be of general interest unless for a similar purpose.

Yours,
Moray
------------------------------------- 
Moray McConnachie
Director of IT    +44 1865 261 600
Oxford Analytica  http://www.oxan.com

-----Original Message-----
From: Michael Garski [mailto:[email protected]] 
Sent: 21 January 2010 01:13
To: [email protected]
Subject: RE: What is the right way for getting paged results with Lucene?

We have a cache system that stores results outside of Lucene keyed on the 
criteria in the query.  When a query is executed we first check the cache to 
determine if we already have results for the query, if so we return results 
from cache.  The results cache is used for both pagination and to reduce load 
on the search servers for repetitive queries.  Utilizing cached results allows 
us to get the results of page 3 without having to re-execute the search.

Michael

-----Original Message-----
From: Simone Chiaretta [mailto:[email protected]]
Sent: Wednesday, January 20, 2010 3:34 PM
To: [email protected]
Subject: Re: What is the right way for getting paged results with Lucene?

And is this done trasparently?
>From what you say it seems like increasing the max result at every page is not 
>enabling this feature.
Simo

On Wednesday, January 20, 2010, Michael Garski <[email protected]> wrote:
> We handle paging through large result sets by executing a query for the 
> maximum number of results, then cache the result set and perform the 
> pagination through cache without re-execution of the query.
>
> Michael
>
> -----Original Message-----
> From: Moray McConnachie [mailto:[email protected]]
> Sent: Wednesday, January 20, 2010 1:20 AM
> To: [email protected]
> Subject: Re: What is the right way for getting paged results with Lucene?
>
> We use a similar method for paging, i.e. we retrieve (pageNo*pageSize) 
> documents and display the bottom pageSize.
>
> Unless you added keeping state to searches, you will always need the search 
> to recalculate all results above your current page in order to determine 
> where your current page is.
>
> Depending on your sort order you could modify your query for subsequent pages 
> using a range query.
>
> But I would not expect this to be significantly faster necessarily. At least 
> in our app the extra overhead of last page in 100 000 hits vs. first page is 
> less than one tenth of a second. Simone is right that in many perhaps most 
> applications only the first few pages are normally accessed.
>
>
> ------------------
> Moray McConnachie
> Director of IT,
> Oxford Analytica
>
>
> -----Original Message-----
> From: Simone Chiaretta <[email protected]>
> Date: Wed, 20 Jan 2010 09:28:39
> To: <[email protected]>
> Subject: Re: What is the right way for getting paged results with Lucene?
>
> Actually I think this is a false problem:
> how many times do you go at page 3 of Google? I never go: if I don't 
> find something useful I just change keywords.
> Depending on how many results you have on a page, the last results 
> might not even be relevant: I've noticed that after a while the score 
> of docs drops drastically: I filter out docs with a normalized score 
> lower then 0.2, so I rarely have more than 100 results
>
> I know I didn't answer your question, actually there is no way afaik 
> to get paged results, but this how I dealt with "paging"
>
> simo
>
> On Wednesday, January 20, 2010, Markus Wolters <[email protected]> wrote:
>> Hello,
>>
>> as you might know, I am pretty new to Lucene and integrating 2.9.1 
>> right now into my current ASP.NET MVC project. And it's really working like 
>> a charm.
>> (Thanks to Michael)
>>
>> I am curious about if I've understood it right, how to do paging with 
>> Lucene. I've implemented it like so:
>>
>>         IndexSearcher searcher = Searcher;
>>
>>       // Collect all resulting documents until selected page
>>         TopScoreDocCollector collector = 
>> TopScoreDocCollector.create((pageIndex + 1) * pageSize, false);
>>       searcher.Search(query, collector);
>>
>>         // Get documents for selected page
>>       TopDocs hits = collector.TopDocs(pageIndex * pageSize, 
>> pageSize);
>>
>> So in case if someone selects one of the last pages of a huge result, 
>> Lucene would go over a lot results, even that I just need 'pageSize' 
>> results, is that right? What about the performance or memory usage? I 
>> took a sneak peek into the Searcher code and believe to have seen 
>> that Lucene is creating a que as big as documents to get. So in case 
>> of a totalhit-count of let's say 20000, a pageSize of 20 and 
>> selecting the last page (999), even that I actually need just the 20 
>> last documents, Lucene is getting (and even allocating mememory for?) all 
>> 20000 resulting documents. Is that right?
>>
>> In other terms, I want to do a MySQL-equivalent to SELECT [...] LIMIT 
>> pageIndex * pageSize, pageSize.
>>
>> Markus
>>
>>
>>
>
> --
> Simone Chiaretta
> Microsoft MVP ASP.NET - ASPInsider
> Blog: http://codeclimber.net.nz
> RSS: http://feeds2.feedburner.com/codeclimber
> twitter: @simonech
>
> Any sufficiently advanced technology is indistinguishable from magic 
> "Life is short, play hard"
>
>
>

--
Simone Chiaretta
Microsoft MVP ASP.NET - ASPInsider
Blog: http://codeclimber.net.nz
RSS: http://feeds2.feedburner.com/codeclimber
twitter: @simonech

Any sufficiently advanced technology is indistinguishable from magic "Life is 
short, play hard"

RE: What is the right way for getting paged results with Lucene?

Reply via email to