The collect() method is going to be invoked once for each document that matches the query (having nonzero score). If the index is very large, that may turn to be a very large number of calls. Often, search applications only fetch additional data (doc fields) for only a small subset of the entire set of documents matching a query - e.g. first page (0-9), second page (10-19), etc. But if your application is going to fetch in an exhaustive manner, and especially for a short field like DB_ID, it sometimes makes sense to cache in memory the entire field (its values for all the docs), for the entire life of the index reader/searcher, and use that cached data. The collect method can then use that cached data.
Lucene maintains and uses a field cache for sorting by fields. But (AFAIK) this capability is not open for use for general application purposes like the one here. You should be able to implement that yourself though. Try searching the list for field caching for some useful discussions and pointers - http://www.nabble.com/forum/Search.jtp?query=fields+caching&local=y&forum=45&daterange=0&startdate=&enddate= Doron Antony Bowesman <[EMAIL PROTECTED]> wrote on 27/02/2007 13:14:12: > I am doing what I should not, i.e. iterating the Hits after a search > to collect > two ID fields from each document in Hits to pass back to the > searcher along with > the score. > > The index is approx 10-15 fields per doc, and indexes mail data, which is not > stored, as it exists elsewhere. Each mail has a unique object ID, > so that gets > indexed as the field "contentid". > > I have been looking at HitCollector, but I was wondering the best > way to collect > the contentId field and score. > > In HitCollector javadoc is says that you should not use > IndexReader.getDocument(doc) during the collection loop, but is there any > difference between > > searcher.search(query, new HitCollector() { > public void collect(int doc, float score) { > bits.set(doc); > } > }); > > iterate bitset { > IndexReader.getDocument(doc, FieldSelector) > saveContentId() > } > > and > > searcher.search(query, new HitCollector() { > public void collect(int doc, float score) { > IndexReader.getDocument(doc, FieldSelector) > saveContentId() > } > }); > > Given that I have to read the documents to get the relevant fields, > does either > method work? > > Thanks > Antony > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]