Re: Returning a minimum number of clusters

Grant Ingersoll Mon, 01 May 2006 10:22:00 -0700

You might be interested in the Carrot project, which has some Lucenesupport. I don't know if it solves your second problem, but it alreadyimplements clustering and may allow you to get to an answer for thesecond problem quicker. I have, just recently, started using it for aclustering task I am working on related to search results. I think theauthor of Carrot is on the user list from time to time



Marvin Humphrey wrote:

Greets,
I'm toying with the idea of implementing clustering of search resultsbased on comparison of document vectors constrained by field. Forinstance, you could cluster based on "topic", or "domain", or"content". "domain" would be easy, as it's presumably a single valuefield. "content" would be much more involved.
The problem I'm trying to solve is how to return a minimum number ofclusters from a search. Say the most relevant 100 documents for aquery are all from the same domain, but you want a maximum of tworesults per domain, a la Google. I don't see any alternative torerunning the query an indeterminate number of times until you'veaccumulated sufficient clusters, because the search logic doesn't knowwhat cluster a document belongs to until the document vector isretrieved.
Is there a better way?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

--

Grant IngersollSr. Software EngineerCenter for Natural Language ProcessingSyracuse UniversitySchool of Information Studies335 Hinds HallSyracuse, NY 13244http://www.cnlp.orgVoice: 315-443-5484Fax: 315-443-6886


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Returning a minimum number of clusters

Reply via email to