Re: Search Results Clustering

Ray Tsang Tue, 30 Aug 2005 22:10:48 -0700

I had similar requirements of "count" and "group by" on over 130mil
records, it's really a pain.  It's currently usable but not
satisfactory.

Currently it's grouping at run-time by iterating through ungrouped
items.  It collects matching documents into BitSet, so subsequent
queries can use BitSet to retrieve the results of original query. 
Moreover, it can mark off documents that are already being grouped
from the BitSet.

In a page that shows 10 records/page, it will only group 10 records at
a time. Consequently, there is no way to know the total number grouped
records in the beginning.

In addition, it feels like reading the field values from the document
in order to look for group-by results is most time consuming.

How does RDBMS do it?

ray,

On 8/31/05, kapilChhabra (sent by Nabble.com) <[EMAIL PROTECTED]> wrote:
> 
> thanks a lot for your suggestion.
> I'll try it and get back if need be.
> 
> Meanwhile, I gave it a thought and concluded that the best time to do the 
> categorization/clustering should be lucene calculates Hits/in the Scrorer.
> I am not sure if I am right.
> In addition to the current functionality can we modify the Scorer class add 
> the following feature:
> The class generates a 2 dimentional array for the clustered field, the first 
> dimention contains the distinct values of the field and the second dimention 
> contains the count of results under this field. This value is incremented for 
> an acceptible hit.
> Does it make sense?
> If it is possible, i'll dig deeper into the code of the Hits/Scorer classes.
> 
> Thanks in advance,
> kapilChhabra
> 
> 
> --
> Sent from the Lucene - Java Users forum at Nabble.com:
> http://www.nabble.com/Search-Results-Clustering-t249355.html#a748901
> 
>

Re: Search Results Clustering

Reply via email to