On Wed, Aug 12, 2009 at 10:50 AM, Ted Dunning<[email protected]> wrote: > Whoa.... > > No. It sounds like I have muddied things thoroughly. What I was saying is > that there are times that tf.idf and llr agree and times that tf.idf and llr > disagree. In my experience, most of the second category are where tf.idf is > over-weighting coincidental cases or where both scores are producing not > good stuff. > > If a phrase or term is marked as good by LLR and is a prominent feature of > the centroid, that is fine. >
Thanks for the explanation, Ted. Is this a necessary & sufficient condition for a good cluster label? On a different note, is there any way to identify relationship among the top labels of the clusters? For example, if I have cluster related automobiles, I may get the companies (GM, Ford, Toyota) along with their poupular models (Corolla, Cadillac, ) as top labels. How can I figure out Toyota and Corolla are strongly related? --shashi
