Hello Shailendra, AFAICS you are reasoning from a static doc-id POV, while documents do not have a static doc-id in lucene. When you have a frequently updated index, you'll end up invalidating cached BitSet's (which as the number of categories and number of documents grow can absorb quite amounts of memory as well), because merging of segments and lucene optimizations take place (shuffling doc-id's around (actually compressing where 'holes' are from deleted documents)). So, obviously, you need to minimize segment merging which you can control to some level, but after merging, you frequently need to compute and cache your BitSet's again. For many categories and many documents, this is not fast enough. If your index changes only a few times, you are fine (not sure how solr handles this, but they ofcourse build facetted navigation in it). Otherwise, you might try having some more or less "static" persistent index, and a volatile memory index in which documents are added. When a doc is updated, you need to set the correct bits in the cached BitSets of the persistent index to 0. Think it is not very easy, but might just work...
Regards Ard > > A better way is following: > Cache the list of doc-ids for each category - you can cache this in a > BitSet.. a bit at index "doc-id" is on if the category is present in > document "doc-id", else it is off. > > For user query, you need to calculate the BitSet, similar to > above way. This > can be done in a HitCollector implementation. > > Then simply do the intersection of user query's BitSet and > each category > BitSet - find count of "on" bits, this would give you count > of documents for > each category. > > The BitSet operations I talked above are already provided in > Java, so your > piece of code would be really small. > > Thanks, > Shailendra Sharma > CTO, Ver Se' Innovation Private Ltd. > Bangalore, India > > On 7/30/07, Dennis Kubes <[EMAIL PROTECTED]> wrote: > > > > We found that a fast way to do this simply by running a > query for each > > category and getting the maxDocs. There would be one query > for category > > getting a single hit. > > > > Dennis Kubes > > > > Erick Erickson wrote: > > > You might want to search the mail archive for "facets" or "faceted > > search" > > > (no quotes), as I *think* this might be relevant. > > > > > > Best > > > Erick > > > > > > On 7/26/07, Ramana Jelda <[EMAIL PROTECTED]> wrote: > > >> Hi , > > >> Of course this statement is very expensive. > > >> -->document.get("CAMPCATID")==null?"":document.get("CAMPCATID"); > > >> > > >> Use StringIndex/FieldCache/something similar to > implement category > > >> counting. > > >> :) > > >> > > >> Jelda > > >> > > >>> -----Original Message----- > > >>> From: Bhavin Pandya [mailto:[EMAIL PROTECTED] > > >>> Sent: Thursday, July 26, 2007 5:20 PM > > >>> To: java-user@lucene.apache.org > > >>> Subject: How to show category count with results? > > >>> > > >>> Hi, > > >>> > > >>> I want to show each category name and its count with results. > > >>> I achieved this using DocCollector but its very slow when no > > >>> of results in lacs... As fetching of documents from reader in > > >>> collect method is expensive... > > >>> > > >>> public void collect(int doc, float score) { > > >>> Document document = mreader.document(doc); > > >>> strcatid = > > >>> document.get("CAMPCATID")==null?"":document.get("CAMPCATID"); > > >>> > > >>> if (catcountmap.containsKey(strcatid)) > > >>> { > > >>> // catid already exists in hashmap... increase > count by one > > >>> > > >>> value = ((Integer)catcountmap.get(strcatid)).intValue(); > > >>> value = value + 1; > > >>> catcountmap.put(strcatid,new Integer(value)); > > >>> } > > >>> else > > >>> catcountmap.put(strcatid,new Integer(1)); > > >>> > > >>> } > > >>> > > >>> > > >>> is there any other better way to achieve this ???? > > >>> > > >>> > > >>> Thanks. > > >>> Bhavin pandya > > >> > > >> > --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > > >> For additional commands, e-mail: [EMAIL PROTECTED] > > >> > > >> > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]