Re: What is the right way for getting paged results with Lucene?

Simone Chiaretta Thu, 21 Jan 2010 00:35:09 -0800

OK: if I query from only results 10, will Lucene store into the cache all
the results till max anyway?
Simo


On Thu, Jan 21, 2010 at 2:12 AM, Michael Garski <[email protected]>wrote:

> We have a cache system that stores results outside of Lucene keyed on the
> criteria in the query.  When a query is executed we first check the cache to
> determine if we already have results for the query, if so we return results
> from cache.  The results cache is used for both pagination and to reduce
> load on the search servers for repetitive queries.  Utilizing cached results
> allows us to get the results of page 3 without having to re-execute the
> search.
>
> Michael
>
> -----Original Message-----
> From: Simone Chiaretta [mailto:[email protected]]
> Sent: Wednesday, January 20, 2010 3:34 PM
> To: [email protected]
> Subject: Re: What is the right way for getting paged results with Lucene?
>
> And is this done trasparently?
> From what you say it seems like increasing the max result at every
> page is not enabling this feature.
> Simo
>
> On Wednesday, January 20, 2010, Michael Garski <[email protected]>
> wrote:
> > We handle paging through large result sets by executing a query for the
> maximum number of results, then cache the result set and perform the
> pagination through cache without re-execution of the query.
> >
> > Michael
> >
> > -----Original Message-----
> > From: Moray McConnachie [mailto:[email protected]]
> > Sent: Wednesday, January 20, 2010 1:20 AM
> > To: [email protected]
> > Subject: Re: What is the right way for getting paged results with Lucene?
> >
> > We use a similar method for paging, i.e. we retrieve (pageNo*pageSize)
> documents and display the bottom pageSize.
> >
> > Unless you added keeping state to searches, you will always need the
> search to recalculate all results above your current page in order to
> determine where your current page is.
> >
> > Depending on your sort order you could modify your query for subsequent
> pages using a range query.
> >
> > But I would not expect this to be significantly faster necessarily. At
> least in our app the extra overhead of last page in 100 000 hits vs. first
> page is less than one tenth of a second. Simone is right that in many
> perhaps most applications only the first few pages are normally accessed.
> >
> >
> > ------------------
> > Moray McConnachie
> > Director of IT,
> > Oxford Analytica
> >
> >
> > -----Original Message-----
> > From: Simone Chiaretta <[email protected]>
> > Date: Wed, 20 Jan 2010 09:28:39
> > To: <[email protected]>
> > Subject: Re: What is the right way for getting paged results with Lucene?
> >
> > Actually I think this is a false problem:
> > how many times do you go at page 3 of Google? I never go: if I don't
> > find something useful I just change keywords.
> > Depending on how many results you have on a page, the last results
> > might not even be relevant: I've noticed that after a while the score
> > of docs drops drastically: I filter out docs with a normalized score
> > lower then 0.2, so I rarely have more than 100 results
> >
> > I know I didn't answer your question, actually there is no way afaik
> > to get paged results, but this how I dealt with "paging"
> >
> > simo
> >
> > On Wednesday, January 20, 2010, Markus Wolters <[email protected]> wrote:
> >> Hello,
> >>
> >> as you might know, I am pretty new to Lucene and integrating 2.9.1 right
> now
> >> into my current ASP.NET MVC project. And it's really working like a
> charm.
> >> (Thanks to Michael)
> >>
> >> I am curious about if I've understood it right, how to do paging with
> >> Lucene. I've implemented it like so:
> >>
> >>         IndexSearcher searcher = Searcher;
> >>
> >>       // Collect all resulting documents until selected page
> >>         TopScoreDocCollector collector =
> >> TopScoreDocCollector.create((pageIndex + 1) * pageSize, false);
> >>       searcher.Search(query, collector);
> >>
> >>         // Get documents for selected page
> >>       TopDocs hits = collector.TopDocs(pageIndex * pageSize, pageSize);
> >>
> >> So in case if someone selects one of the last pages of a huge result,
> Lucene
> >> would go over a lot results, even that I just need 'pageSize' results,
> is
> >> that right? What about the performance or memory usage? I took a sneak
> peek
> >> into the Searcher code and believe to have seen that Lucene is creating
> a
> >> que as big as documents to get. So in case of a totalhit-count of let's
> say
> >> 20000, a pageSize of 20 and selecting the last page (999), even that I
> >> actually need just the 20 last documents, Lucene is getting (and even
> >> allocating mememory for?) all 20000 resulting documents. Is that right?
> >>
> >> In other terms, I want to do a MySQL-equivalent to SELECT [...] LIMIT
> >> pageIndex * pageSize, pageSize.
> >>
> >> Markus
> >>
> >>
> >>
> >
> > --
> > Simone Chiaretta
> > Microsoft MVP ASP.NET - ASPInsider
> > Blog: http://codeclimber.net.nz
> > RSS: http://feeds2.feedburner.com/codeclimber
> > twitter: @simonech
> >
> > Any sufficiently advanced technology is indistinguishable from magic
> > "Life is short, play hard"
> >
> >
> >
>
> --
> Simone Chiaretta
> Microsoft MVP ASP.NET - ASPInsider
> Blog: http://codeclimber.net.nz
> RSS: http://feeds2.feedburner.com/codeclimber
> twitter: @simonech
>
> Any sufficiently advanced technology is indistinguishable from magic
> "Life is short, play hard"
>
>
>


-- 
Simone Chiaretta
Microsoft MVP ASP.NET - ASPInsider
Blog: http://codeclimber.net.nz
RSS: http://feeds2.feedburner.com/codeclimber
twitter: @simonech

Any sufficiently advanced technology is indistinguishable from magic
"Life is short, play hard"

Re: What is the right way for getting paged results with Lucene?

Reply via email to