http://www.lucidimagination.com/search/document/3ae15062f35420cf/lda_for_multi_label_classification_was_mahout_book
<http://www.lucidimagination.com/search/document/3ae15062f35420cf/lda_for_multi_label_classification_was_mahout_book>David gave me a very nice paper which talks about tag-document correlation. If you start with named labels, it does end up being naive bayes classifier. On Mon, Jan 11, 2010 at 2:23 AM, Grant Ingersoll <[email protected]>wrote: > A couple of things strike me about LDA, and I wanted to hear others > thoughts: > > 1. The LDA implementation (and seems to be reinforced by my reading on > topic models in general) is that the topic themselves don't have "names". I > can see why this is difficult (in some ways, your summarizing a summary), > but am curious whether anyone has done any work on such a thing as w/o them > it still requires a fair amount by the human to infer what the topics are. > I suppose you could just pick the top few terms, but seems like a common > phrase or something would go further. Also, I believe someone in the past > mentioned some more recent work by Blei and Lafferty (Blei and Lafferty. > Visualizing Topics with Multi-Word Expressions. stat (2009) vol. 1050 pp. 6) > to alleviate that. > > 2. We get the words in the topic, but how do we know which documents have > those topics? I think, based on reading the paper, that the answer is "You > don't get to know", but I'm not sure. > If I am correct, You do get to know based on the words in the document which of those un-labelled topics are in the documents with an affinity score to eacj. You can sort it or do some form of testing to filter out the ones with significance. > > -Grant
