On Wed, Aug 6, 2014 at 4:57 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> On Wed, Aug 6, 2014 at 5:49 PM, Dmitriy Lyubimov <dlie...@gmail.com> > wrote: > > > On Wed, Aug 6, 2014 at 4:21 PM, Dmitriy Lyubimov <dlie...@gmail.com> > > wrote: > > > > I suppose in that context LLR is considered a distance (higher scores > mean > > > more `distant` items, co-occurring by chance only)? > > > > > > > Self-correction on this one -- having given a quick look at llr paper > > again, it looks like it is actually a similarity (higher scores meaning > > more stable co-occurrences, i.e. it moves in the opposite direction of > > p-value if it had been a classic test > > > > LLR is a classic test. What i meant here it doesn't produce a p-value. or does it? > It is essentially Pearson's chi^2 test without the > normal approximation. See my papers[1][2] introducing the test into > computational linguistics (which ultimately brought it into all kinds of > fields including recommendations) and also references for the G^2 test[3]. > > [1] http://www.aclweb.org/anthology/J93-1003 > [2] http://arxiv.org/abs/1207.1847 > [3] http://en.wikipedia.org/wiki/G-test >