well ... once you have the list of all "category" names that are in docs which match your orriginal query, you can either redo the orriginal query with "and category:XXXX" to get the counts, or you can pre-compute (and save) a BitSet for each category in your index (esay to build using a HitCollector or a Filter), and find the cardinality of the intersection of each of those BitSets with a BitSet from your search (again: using a HitCollector on your orriginal query)
for the record: this is not a trivial task. i've describe the bare basics of the issue ... but there's a lot of processing going on to get these kinds of numbers. if you search the list for "category" and "count" you'll find this has come up at least one other time in the last few months. : Date: Thu, 5 May 2005 20:37:19 +0200 : From: Pablo Gomes Ludermir <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org, : Pablo Gomes Ludermir <[EMAIL PROTECTED]> : To: java-user@lucene.apache.org : Subject: Re: categorized search : : Chris, : : That was partially what I needed. You got it right when I said I : needed the number of categories that I particular term appears (and it : works). : But, I also would like to know in how many documents in each category : that term appears. : : For instance: title:lucene appears in the category "search engines" : and "open source software", and it appears in the documents 1, 2 and 3 : in the category "search engines" and in documents 4 and 7 in the : categoy "open source". I could not get it to work yet (maybe because : of my lack of experience with Lucene). : Someone could give me a hand??? : Thanks : Pablo : : On 4/24/05, Chris Hostetter <[EMAIL PROTECTED]> wrote: : > : > : >I have indexed a field that describes the "category" of the document. : > : >Thus, I want to know how many categories have a specific term. Could : > : >someone help me to get this with good performance? : > : > I think I'm reading this question different than Chuck, so I'll toss out : > somethign totally different... : > : > as I understand it, you've indexed a bunch of documents, with a variety of : > fields, one of which is "category" (for example, maybe you are indexing : > news articles, that each have a "title", "description", "url", and : > "category"). Now you have a term like "title:lucene" (or : > "description:pope") and you want to know the number of unique terms in the : > category field that exist in articles that contain your input term. : > : > If that's what you're looking for, then you can problem achieve this by: : > 1) make a TermQuery for your input term (ie: "title:lucene") : > 2) put that TermQuery in a QueryFilter, and call bits(reader) : > 3) call FieldCache.DEFAULT.getStrings(reader,"category") : > 3) loop over the true bits in the BitSet from #3, and for each one, add : > the corrisponding entry from the String[] in #4 to a Set. : > : > when you're all done, the Set will be the list of categories, and the size : > of that Set is the number (i think) you wanted. : > : > (DISCLAIMER: I've never acctaully used FieldCache, i'm just giving you my : > advice based on reading the javadocs) : > : > -Hoss : > : > : > --------------------------------------------------------------------- : > To unsubscribe, e-mail: [EMAIL PROTECTED] : > For additional commands, e-mail: [EMAIL PROTECTED] : > : > : : : -- : Pablo Gomes Ludermir : [EMAIL PROTECTED] : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]