Greets,
I'm toying with the idea of implementing clustering of search results
based on comparison of document vectors constrained by field. For
instance, you could cluster based on "topic", or "domain", or
"content". "domain" would be easy, as it's presumably a single value
field. "content" would be much more involved.
The problem I'm trying to solve is how to return a minimum number of
clusters from a search. Say the most relevant 100 documents for a
query are all from the same domain, but you want a maximum of two
results per domain, a la Google. I don't see any alternative to
rerunning the query an indeterminate number of times until you've
accumulated sufficient clusters, because the search logic doesn't
know what cluster a document belongs to until the document vector is
retrieved.
Is there a better way?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]