OK: if I query from only results 10, will Lucene store into the cache all the results till max anyway? Simo
On Thu, Jan 21, 2010 at 2:12 AM, Michael Garski <[email protected]>wrote: > We have a cache system that stores results outside of Lucene keyed on the > criteria in the query. When a query is executed we first check the cache to > determine if we already have results for the query, if so we return results > from cache. The results cache is used for both pagination and to reduce > load on the search servers for repetitive queries. Utilizing cached results > allows us to get the results of page 3 without having to re-execute the > search. > > Michael > > -----Original Message----- > From: Simone Chiaretta [mailto:[email protected]] > Sent: Wednesday, January 20, 2010 3:34 PM > To: [email protected] > Subject: Re: What is the right way for getting paged results with Lucene? > > And is this done trasparently? > From what you say it seems like increasing the max result at every > page is not enabling this feature. > Simo > > On Wednesday, January 20, 2010, Michael Garski <[email protected]> > wrote: > > We handle paging through large result sets by executing a query for the > maximum number of results, then cache the result set and perform the > pagination through cache without re-execution of the query. > > > > Michael > > > > -----Original Message----- > > From: Moray McConnachie [mailto:[email protected]] > > Sent: Wednesday, January 20, 2010 1:20 AM > > To: [email protected] > > Subject: Re: What is the right way for getting paged results with Lucene? > > > > We use a similar method for paging, i.e. we retrieve (pageNo*pageSize) > documents and display the bottom pageSize. > > > > Unless you added keeping state to searches, you will always need the > search to recalculate all results above your current page in order to > determine where your current page is. > > > > Depending on your sort order you could modify your query for subsequent > pages using a range query. > > > > But I would not expect this to be significantly faster necessarily. At > least in our app the extra overhead of last page in 100 000 hits vs. first > page is less than one tenth of a second. Simone is right that in many > perhaps most applications only the first few pages are normally accessed. > > > > > > ------------------ > > Moray McConnachie > > Director of IT, > > Oxford Analytica > > > > > > -----Original Message----- > > From: Simone Chiaretta <[email protected]> > > Date: Wed, 20 Jan 2010 09:28:39 > > To: <[email protected]> > > Subject: Re: What is the right way for getting paged results with Lucene? > > > > Actually I think this is a false problem: > > how many times do you go at page 3 of Google? I never go: if I don't > > find something useful I just change keywords. > > Depending on how many results you have on a page, the last results > > might not even be relevant: I've noticed that after a while the score > > of docs drops drastically: I filter out docs with a normalized score > > lower then 0.2, so I rarely have more than 100 results > > > > I know I didn't answer your question, actually there is no way afaik > > to get paged results, but this how I dealt with "paging" > > > > simo > > > > On Wednesday, January 20, 2010, Markus Wolters <[email protected]> wrote: > >> Hello, > >> > >> as you might know, I am pretty new to Lucene and integrating 2.9.1 right > now > >> into my current ASP.NET MVC project. And it's really working like a > charm. > >> (Thanks to Michael) > >> > >> I am curious about if I've understood it right, how to do paging with > >> Lucene. I've implemented it like so: > >> > >> IndexSearcher searcher = Searcher; > >> > >> // Collect all resulting documents until selected page > >> TopScoreDocCollector collector = > >> TopScoreDocCollector.create((pageIndex + 1) * pageSize, false); > >> searcher.Search(query, collector); > >> > >> // Get documents for selected page > >> TopDocs hits = collector.TopDocs(pageIndex * pageSize, pageSize); > >> > >> So in case if someone selects one of the last pages of a huge result, > Lucene > >> would go over a lot results, even that I just need 'pageSize' results, > is > >> that right? What about the performance or memory usage? I took a sneak > peek > >> into the Searcher code and believe to have seen that Lucene is creating > a > >> que as big as documents to get. So in case of a totalhit-count of let's > say > >> 20000, a pageSize of 20 and selecting the last page (999), even that I > >> actually need just the 20 last documents, Lucene is getting (and even > >> allocating mememory for?) all 20000 resulting documents. Is that right? > >> > >> In other terms, I want to do a MySQL-equivalent to SELECT [...] LIMIT > >> pageIndex * pageSize, pageSize. > >> > >> Markus > >> > >> > >> > > > > -- > > Simone Chiaretta > > Microsoft MVP ASP.NET - ASPInsider > > Blog: http://codeclimber.net.nz > > RSS: http://feeds2.feedburner.com/codeclimber > > twitter: @simonech > > > > Any sufficiently advanced technology is indistinguishable from magic > > "Life is short, play hard" > > > > > > > > -- > Simone Chiaretta > Microsoft MVP ASP.NET - ASPInsider > Blog: http://codeclimber.net.nz > RSS: http://feeds2.feedburner.com/codeclimber > twitter: @simonech > > Any sufficiently advanced technology is indistinguishable from magic > "Life is short, play hard" > > > -- Simone Chiaretta Microsoft MVP ASP.NET - ASPInsider Blog: http://codeclimber.net.nz RSS: http://feeds2.feedburner.com/codeclimber twitter: @simonech Any sufficiently advanced technology is indistinguishable from magic "Life is short, play hard"
