We're doing something very similar.  Recently C|Net started using Lucene and 
there is a blog entry about how they implemented a "category" scheme that 
basically does what you want.

http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-t266441.html#a748420

The idea is to generate a bunch of bitsets where the ordinal bit number is 
based on the document number in the index.

First result set: perform your search.  Then generate a bitset called FRSBS 
(full result set bitset).

Next: foreach "word" you want to count generate a bitset (you do this by 
traversing the index directly for each word).  Call these bitsets WBS(n) 
(word bit set n.  n spans 1..m where m is the total number of words you want 
to count against)

Finally: to get a count per word bit-wise AND each WBS(n) with FRSBS and count 
up the 1s

Jim Powers

On Sunday 29 January 2006 07:55, zzzzz shalev wrote:
> hey,
>
>   i have a bit of a complex problem,
>   i need to group results recieved in a result set,
>   for example:
>
>   my result set returns 10,000 results
>
>   there are about 10 fields in each result document
>
>   i need to group the most frequent values appearing in each field.
>
>   if 1 of my fields is named color: then i want the 20 most frequent colors
> in the 10,000 results,,,
>
>   blue - occurs 1302 times
>   red  - occurs  200 times
>
>   and so on....
>
>   any ideas will be unbelievably appreciated,
>
>   thanx
>
>
> ---------------------------------
>
>  What are the most popular cars? Find out at Yahoo! Autos

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to