You might have some luck searching the mailing list for "faceted search", as I remember there's been quite a discussion on that topic and I *think* it applies...
Even if you use a HitCollector, you still have to categorize your document, and all you have is the doc id to work with. But I think you'll be able to combine a HitCollector with TermDocs and be pretty quick about getting your results. Something like this in your hitcollector.... TermDocs td = new TermDocs(); td.seek(new Term("objecttype", "coupons"); if (td.skipTo(docid) && td.doc == docId) { increment coupons counter } And repeat ad nauseum. But what I'd really do is just collect an ordered list of all the doc IDs in my hitcollector and *then* do something like... TermDocs tdCoupons = new TermDocs(); TermDocs tdMovies = new TermDocs(); tdCoupons.seek(new Term("objecttype", "coupons"); tdMovies.seek(new Term("objecttype", "movies"); for (int docId : setOfAllDocIds) { if (tdCoupons.skipTo(docId) && tdCoupons.doc == docId) { increment coupon counter; } if (tdMovies.skipTo(docId) && tdMovies.doc == docId) { increment movie counter } } That way, you're progressing through all of the TermDocs in order and not skipping around so much. I really have no clue how much more efficient this would be...... Best Erick On 2/22/07, Phillip Rhodes <[EMAIL PROTECTED]> wrote:
I have a query that can return documents that represent different types of things (e.g. books, movies, coupons, etc) There is a "object_type" keyword on each document, so I can tell that a document is a coupon or a book etc... The problem is that I need to display a count of each item type that was found. For example, your searched returned: 67 coupons, 54 movies, 28 books... While I can loop through each document and increment some sort of counter by document type, sometimes I have over a 2000 documents, and this would mean that the query would be executed internally by lucene 20 times (for every 100 records). I am looking at the HitCollector, but since I would need to still get each and every document (to figure out if it's a coupon vs. a book), I am not sure if there would be any benefits. Would using a HitCollector cause the query to be run only 1x vs. 20 for 2000 documents? Would that be the only benefit? I would be interested in hearing what others think about this problem and how to best implement this functionality with lucene. Thank you. Phillip --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]