17 mar 2006 kl. 16.36 skrev Java Programmer:
My problem concerns result grouping, the best example will be
Google search
where you have results sorted by relevance, and also grouped by
domain (they
have little indent/margin). In my project I want to get similar
functionality, without very huge CPU consumption.
Can you share any helpful hints ?
I do that. Basically I marshall the hit documents to java instances
of Comparable. Then I just plain old Collections.sort(the documents
as object representation). Each document may contain classification
weights. Weights points at a classifiction value, and the
classification value points at a clazz.
UML class diagram:
[Persistent]<#>--- {0..*} ->[ClassificationWeight +compareTo()
+weight:float]---- {1} ->[Classification + compareTo()
+value:String]--- {1} ->[Clazz +fieldName:String + compareTo()]
A clazz in this instance is the "group of domain names". The
classification is the actual domain name. You can skip the weight if
you only use domain names. I guess all weights would be 1.
The weight compares to classifications that compares to the clazz. If
two weights equal I use the lucene score.
You might want to do several passes or nested order to get the group
top score as the natrual order per group cluster.
It handles 3000+ queries per minute on 120000+ documents in RAM, 24/7
on a dual core at 40% load. And I even use Lucene for persistency
even though I should not.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]