Right! but how do you do that if you only saved co-occurrence counts?

You can surely pull a very similarly-shaped trick to calculate the
cosine measure; that's exactly what this paper is doing in fact. But
it's a different computation.

Right now the job saves *all* the info it might need to calculate any
of these things later. And that's heavy.

On Mon, Jul 18, 2011 at 11:06 PM, Jake Mannix <[email protected]> wrote:
> On Mon, Jul 18, 2011 at 2:53 PM, Sean Owen <[email protected]> wrote:
>
>> How do you implement, for instance, the cosine similarity with this output?
>> That's the intent behind preserving this info, which is surely a lot
>> to preserve.
>>
>
> Sorry to jump in the middle of this, but cosine is not too hard to use nice
> combiners, as it can be done by first normalizing the rows and then
> doing my ubiquitous "outer product of columns" trick on the resultant
> corpus (this latter job uses combiners easily because the mappers do all
> multiplications, and all reducers are simply sums, and thus are commutative
> and associative).
>
> Not sure about the other fancy similarities.

Reply via email to