Greets,

I'm toying with the idea of implementing clustering of search results based on comparison of document vectors constrained by field. For instance, you could cluster based on "topic", or "domain", or "content". "domain" would be easy, as it's presumably a single value field. "content" would be much more involved.

The problem I'm trying to solve is how to return a minimum number of clusters from a search. Say the most relevant 100 documents for a query are all from the same domain, but you want a maximum of two results per domain, a la Google. I don't see any alternative to rerunning the query an indeterminate number of times until you've accumulated sufficient clusters, because the search logic doesn't know what cluster a document belongs to until the document vector is retrieved.

Is there a better way?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to