Thanks for all this. We're doing warmup searching also, but just for some common date searches. The warmup would be a good place to add some pre-caching capability. I'll plan for this eventually and start with the partial cache for now.
Thanks, Brian Beard -----Original Message----- From: Antony Bowesman [mailto:[EMAIL PROTECTED] Sent: Thursday, January 10, 2008 3:19 PM To: java-user@lucene.apache.org Subject: Re: how do I get my own TopDocHitCollector? Beard, Brian wrote: > Ok, I've been thinking about this some more. Is the cache mechanism > pulling from the cache if the external id already exists there and then > hitting the searcher if it's not already in the cache (maybe using a > FieldSelector for just retrieving the external id)? I am warming searchers in background and each search has one or more query related caches. The external Id cache is normally preloaded by simply iterating terms, e.g. String field = fieldName.intern(); final String[] retArray = new String[reader.maxDoc()]; TermDocs termDocs = reader.termDocs(); TermEnum termEnum = reader.terms (new Term (field, "")); try { do { Term term = termEnum.term(); if (term == null || term.field() != field) break; String termval = term.text(); termDocs.seek(termEnum); while (termDocs.next()) { retArray[termDocs.doc()] = termval; } } while (termEnum.next()); } finally { termDocs.close(); termEnum.close(); } return retArray; I do allow for a partial cache, in which case, as you suggest, the searcher uses a FieldSelector to get the external Id from the document which then is stored to cache. Antony > > -----Original Message----- > From: Beard, Brian [mailto:[EMAIL PROTECTED] > Sent: Thursday, January 10, 2008 10:08 AM > To: java-user@lucene.apache.org > Subject: RE: how do I get my own TopDocHitCollector? > > Thanks for the post. So you're using the doc id as the key into the > cache to retrieve the external id. Then what mechanism fetches the > external id's from the searcher and places them in the cache? > > > -----Original Message----- > From: Antony Bowesman [mailto:[EMAIL PROTECTED] > Sent: Wednesday, January 09, 2008 7:19 PM > To: java-user@lucene.apache.org > Subject: Re: how do I get my own TopDocHitCollector? > > Beard, Brian wrote: >> Question: >> >> The documents that I index have two id's - a unique document id and a >> record_id that can link multiple documents together that belong to a >> common record. >> >> I'd like to use something like TopDocs to return the first 1024 > results >> that have unique record_id's, but I will want to skip some of the >> returned documents that have the same record_id. We're using the >> ParallelMultiSearcher. >> >> I read that I could use a HitCollector and throw an exception to get > it >> to stop, but is there a cleaner way? > > I'm doing a similar thing. I have external Ids (equivalent to yout > record_id), > which have one or more Lucene Documents associated with them. I wrote a > custom > HitCollector that uses a Map to hold the so far collected external ids > along > with the collected document. > > I had to write my own priority queue to know when an object was dropped > of the > bottom of the score sorted queue, but the latest PriorityQueue on the > trunk now > has insertWithOverflow(), which does the same thing. > > Note that ResultDoc extends ScoreDoc, so that the external Id of the > item > dropped off the queue can be used to remove it from my Map. > > Code snippet is somewhat as below (I am caching my external Ids, hence > the cache > usage) > > protected Map<OfficeId, ScoreDoc> results; > > public void collect(int doc, float score) > { > if (score > 0.0f) > { > totalHits++; > if (pq.size() < numHits || score > minScore) > { > OfficeId id = cache.get(doc); > ResultDoc rd = results.get(id); > // No current result for this ID yet found > if (rd == null) > { > rd = new ResultDoc(id, doc, score); > ResultDoc added = pq.insert(rd); > if (added == null) > { > // Nothing dropped of the bottom > results.put(id, rd); > } > else > { > // Return value dropped of the bottom > results.remove(added.id); > results.put(id, rd); > remaining++; > } > } > // Already found this ID, so replace high score if > necessary > else > { > if (score > rd.score) > { > pq.remove(rd); > rd.score = score; > pq.insert(rd); > } > } > // Readjust our minimum score again from the top entry > minScore = pq.peek().score; > } > else > remaining++; > } > } > > HTH > Antony > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]