Thinking about this a bit more, it occurs to me that this paper only analyzes the corpus for collocations that are detectable in all contexts. One major strength of LDA is that it allows for polysemy. As such, it should be possible to find collocations that are limited to particular topic contexts much more sensitively than is possible when ignoring context.
Topic tagging of words also has a strong potential for language modeling and retrieval in general exactly because it allows for good sense disambiguation. On Sat, Sep 5, 2009 at 4:27 PM, Ted Dunning <[email protected]> wrote: > > > On Sat, Sep 5, 2009 at 1:58 PM, Sebastien Bratieres <[email protected]>wrote: > >> I've come across this article (Lafferty & Blei 2009) >> http://www.citeulike.org/user/maximzhao/article/5084329 which seems to >> build >> upon Ted's log likelihood ratio. > > > Yes. And they do a good job of it. Their backoff model is essentially > identical to one that I proposed in my dissertation, but the permutation > test is a good addition for deciding which LR's useful in finding new > n-grams for the model. The permutation test that they propose is also very > similar to the one that I used for analyzing genetic sequence data with > LR's, although in very different context. >
