On Dec 31, 2009, at 3:39 PM, Ted Dunning wrote:
> - can the clustering algorithm be viewed in a probabilistic framework > (k-means, LDA, Dirichlet = yes, agglomerative clustering using nearest > neighbors = not so much) > > - is the definition of a cluster abstract enough to be flexible with regard > to whether a cluster is a model or does it require stronger limits. > (k-means = symmetric Gaussian with equal variance, Dirichlet = almost any > probabilistic model) Can you elaborate a bit more on these two? I can see a bit on the probability side, as those approaches play a factor in how similarity is determined, but I don't get the significance of "cluster as a model". Is it just a simplification that then makes it easier to ask: does this document fit into the model? -Grant
