17 mar 2006 kl. 16.36 skrev Java Programmer:

My problem concerns result grouping, the best example will be Google search where you have results sorted by relevance, and also grouped by domain (they
have little indent/margin). In my project I want to get similar
functionality, without very huge CPU consumption.

Can you share any helpful hints ?

I do that. Basically I marshall the hit documents to java instances of Comparable. Then I just plain old Collections.sort(the documents as object representation). Each document may contain classification weights. Weights points at a classifiction value, and the classification value points at a clazz.

UML class diagram:

[Persistent]<#>--- {0..*} ->[ClassificationWeight +compareTo() +weight:float]---- {1} ->[Classification + compareTo() +value:String]--- {1} ->[Clazz +fieldName:String + compareTo()]

A clazz in this instance is the "group of domain names". The classification is the actual domain name. You can skip the weight if you only use domain names. I guess all weights would be 1.

The weight compares to classifications that compares to the clazz. If two weights equal I use the lucene score.

You might want to do several passes or nested order to get the group top score as the natrual order per group cluster.

It handles 3000+ queries per minute on 120000+ documents in RAM, 24/7 on a dual core at 40% load. And I even use Lucene for persistency even though I should not.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to