Cosine similarity is just the square root of a dot product of weighted counts.
Do the weighting in the mapper and the square root in the reducer. Everything in between is addition of products. On Mon, Jul 18, 2011 at 2:53 PM, Sean Owen <[email protected]> wrote: > How do you implement, for instance, the cosine similarity with this output? > That's the intent behind preserving this info, which is surely a lot > to preserve. > > On Mon, Jul 18, 2011 at 10:49 PM, Ted Dunning <[email protected]> > wrote: > > So argued. The output should be a pair and a count and the pair should be > the key. Or the output should be a named vector containing keys and indexed > by keys (requires a dictionary). Either form allows a combiner. > > > > Sent from my iPhone > > > > On Jul 18, 2011, at 14:41, Sean Owen <[email protected]> wrote: > > > >> Yes, but the output of the phase in question is *not* a count. It > >> can't be combined. > >> You could argue that this is the problem! > > >
