A couple of things strike me about LDA, and I wanted to hear others thoughts:

1. The LDA implementation (and seems to be reinforced by my reading on topic 
models in general) is that the topic themselves don't have "names".  I can see 
why this is difficult (in some ways, your summarizing a summary), but am 
curious whether anyone has done any work on such a thing as w/o them it still 
requires a fair amount by the human to infer what the topics are.  I suppose 
you could just pick the top few terms, but seems like a common phrase or 
something would go further.  Also, I believe someone in the past mentioned some 
more recent work by Blei and Lafferty (Blei and Lafferty. Visualizing Topics 
with Multi-Word Expressions. stat (2009) vol. 1050 pp. 6) to alleviate that.

2. We get the words in the topic, but how do we know which documents have those 
topics?  I think, based on reading the paper, that the answer is "You don't get 
to know", but I'm not sure.  


-Grant

Reply via email to