BTW the cooccurrence code is going into RSJ too and there are uses of that 
where cosine is expected. I don’t know how to think about cross-cosine. Is 
there an argument for LLR only in RSJ?

On Aug 6, 2014, at 5:20 PM, Sebastian Schelter <ssc.o...@googlemail.com> wrote:

Sounds good to me.

-s
Am 06.08.2014 17:15 schrieb "Dmitriy Lyubimov" <dlie...@gmail.com>:

> what i mean here i probably need to refactor it a little so that there's
> part of algorithm that accepts co-occurrence input directly and which is
> somewhat decoupled from the part that accepts u x item input and does
> downsampling and co-occurrence construction. So i could do some
> customization of my own to co-occurrence construction. Would that be
> reasonable if i do that?
> 
> 
> On Wed, Aug 6, 2014 at 5:12 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> wrote:
> 
>> Asking because i am considering pulling this implementation but for some
>> (mostly political) reasons people want to try different things here.
>> 
>> I may also have to start with a different way of constructing
>> co-occurrences, and may do a few optimizations there (i.e. priority queue
>> queing/enqueing does twice the work it really needs to do etc.)
>> 
>> 
>> 
>> 
>> On Wed, Aug 6, 2014 at 5:05 PM, Sebastian Schelter <
>> ssc.o...@googlemail.com> wrote:
>> 
>>> I chose against porting all the similarity measures to the dsl version
> of
>>> the cooccurrence analysis for two reasons. First, adding the measures
> in a
>>> generalizable way makes the code superhard to read. Second, in
> practice, I
>>> have never seen something giving better results than llr. As Ted pointed
>>> out, a lot of the foundations of using similarity measures comes from
>>> wanting to predict ratings, which people never do in practice. I think
> we
>>> should restrict ourselves to approaches that work with implicit,
>>> count-like
>>> data.
>>> 
>>> -s
>>> Am 06.08.2014 16:58 schrieb "Ted Dunning" <ted.dunn...@gmail.com>:
>>> 
>>>> On Wed, Aug 6, 2014 at 5:49 PM, Dmitriy Lyubimov <dlie...@gmail.com>
>>>> wrote:
>>>> 
>>>>> On Wed, Aug 6, 2014 at 4:21 PM, Dmitriy Lyubimov <dlie...@gmail.com
>> 
>>>>> wrote:
>>>>> 
>>>>> I suppose in that context LLR is considered a distance (higher
> scores
>>>> mean
>>>>>> more `distant` items, co-occurring by chance only)?
>>>>>> 
>>>>> 
>>>>> Self-correction on this one -- having given a quick look at llr
> paper
>>>>> again, it looks like it is actually a similarity (higher scores
>>> meaning
>>>>> more stable co-occurrences, i.e. it moves in the opposite direction
> of
>>>>> p-value if it had been a classic  test
>>>>> 
>>>> 
>>>> LLR is a classic test.  It is essentially Pearson's chi^2 test without
>>> the
>>>> normal approximation.  See my papers[1][2] introducing the test into
>>>> computational linguistics (which ultimately brought it into all kinds
> of
>>>> fields including recommendations) and also references for the G^2
>>> test[3].
>>>> 
>>>> [1] http://www.aclweb.org/anthology/J93-1003
>>>> [2] http://arxiv.org/abs/1207.1847
>>>> [3] http://en.wikipedia.org/wiki/G-test
>>>> 
>>> 
>> 
>> 
> 

Reply via email to