A simple solution if you only have 20,000 docs is just to iterate through the hits and count them up against each color etc, this could be in a HitCollector. The balance here is performance vs memory usage, if you have a lot of users I would go for a solution that was less efficient but used a lot less memory.
Mike www.ardentia.com the home of NetSearch -----Original Message----- From: zzzzz shalev [mailto:[EMAIL PROTECTED] Sent: 30 January 2006 17:16 To: java-user@lucene.apache.org Subject: Re: grouping results by fields hey Jim, thanks alot for the quick reply! much appreciated i will look a little closer into what is done in C|Net , seems more cost efficient than what im currently doing ;) however i am not sure how scaleable the solution is if , for example, i recieved 20,000 results and i have 2000 different colors (a possible scenario in my case) Jim Powers <[EMAIL PROTECTED]> wrote: We're doing something very similar. Recently C|Net started using Lucene and there is a blog entry about how they implemented a "category" scheme that basically does what you want. http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-C ategory-Listings-t266441.html#a748420 The idea is to generate a bunch of bitsets where the ordinal bit number is based on the document number in the index. First result set: perform your search. Then generate a bitset called FRSBS (full result set bitset). Next: foreach "word" you want to count generate a bitset (you do this by traversing the index directly for each word). Call these bitsets WBS(n) (word bit set n. n spans 1..m where m is the total number of words you want to count against) Finally: to get a count per word bit-wise AND each WBS(n) with FRSBS and count up the 1s Jim Powers On Sunday 29 January 2006 07:55, zzzzz shalev wrote: > hey, > > i have a bit of a complex problem, > i need to group results recieved in a result set, > for example: > > my result set returns 10,000 results > > there are about 10 fields in each result document > > i need to group the most frequent values appearing in each field. > > if 1 of my fields is named color: then i want the 20 most frequent colors > in the 10,000 results,,, > > blue - occurs 1302 times > red - occurs 200 times > > and so on.... > > any ideas will be unbelievably appreciated, > > thanx > > > --------------------------------- > > What are the most popular cars? Find out at Yahoo! Autos --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------- Do you Yahoo!? With a free 1 GB, there's more in store with Yahoo! Mail. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]