We handle paging through large result sets by executing a query for the maximum number of results, then cache the result set and perform the pagination through cache without re-execution of the query.
Michael -----Original Message----- From: Moray McConnachie [mailto:[email protected]] Sent: Wednesday, January 20, 2010 1:20 AM To: [email protected] Subject: Re: What is the right way for getting paged results with Lucene? We use a similar method for paging, i.e. we retrieve (pageNo*pageSize) documents and display the bottom pageSize. Unless you added keeping state to searches, you will always need the search to recalculate all results above your current page in order to determine where your current page is. Depending on your sort order you could modify your query for subsequent pages using a range query. But I would not expect this to be significantly faster necessarily. At least in our app the extra overhead of last page in 100 000 hits vs. first page is less than one tenth of a second. Simone is right that in many perhaps most applications only the first few pages are normally accessed. ------------------ Moray McConnachie Director of IT, Oxford Analytica -----Original Message----- From: Simone Chiaretta <[email protected]> Date: Wed, 20 Jan 2010 09:28:39 To: <[email protected]> Subject: Re: What is the right way for getting paged results with Lucene? Actually I think this is a false problem: how many times do you go at page 3 of Google? I never go: if I don't find something useful I just change keywords. Depending on how many results you have on a page, the last results might not even be relevant: I've noticed that after a while the score of docs drops drastically: I filter out docs with a normalized score lower then 0.2, so I rarely have more than 100 results I know I didn't answer your question, actually there is no way afaik to get paged results, but this how I dealt with "paging" simo On Wednesday, January 20, 2010, Markus Wolters <[email protected]> wrote: > Hello, > > as you might know, I am pretty new to Lucene and integrating 2.9.1 right now > into my current ASP.NET MVC project. And it's really working like a charm. > (Thanks to Michael) > > I am curious about if I've understood it right, how to do paging with > Lucene. I've implemented it like so: > > IndexSearcher searcher = Searcher; > > // Collect all resulting documents until selected page > TopScoreDocCollector collector = > TopScoreDocCollector.create((pageIndex + 1) * pageSize, false); > searcher.Search(query, collector); > > // Get documents for selected page > TopDocs hits = collector.TopDocs(pageIndex * pageSize, pageSize); > > So in case if someone selects one of the last pages of a huge result, Lucene > would go over a lot results, even that I just need 'pageSize' results, is > that right? What about the performance or memory usage? I took a sneak peek > into the Searcher code and believe to have seen that Lucene is creating a > que as big as documents to get. So in case of a totalhit-count of let's say > 20000, a pageSize of 20 and selecting the last page (999), even that I > actually need just the 20 last documents, Lucene is getting (and even > allocating mememory for?) all 20000 resulting documents. Is that right? > > In other terms, I want to do a MySQL-equivalent to SELECT [...] LIMIT > pageIndex * pageSize, pageSize. > > Markus > > > -- Simone Chiaretta Microsoft MVP ASP.NET - ASPInsider Blog: http://codeclimber.net.nz RSS: http://feeds2.feedburner.com/codeclimber twitter: @simonech Any sufficiently advanced technology is indistinguishable from magic "Life is short, play hard"
