On Mon, Jul 18, 2011 at 2:53 PM, Sean Owen <[email protected]> wrote: > How do you implement, for instance, the cosine similarity with this output? > That's the intent behind preserving this info, which is surely a lot > to preserve. >
Sorry to jump in the middle of this, but cosine is not too hard to use nice combiners, as it can be done by first normalizing the rows and then doing my ubiquitous "outer product of columns" trick on the resultant corpus (this latter job uses combiners easily because the mappers do all multiplications, and all reducers are simply sums, and thus are commutative and associative). Not sure about the other fancy similarities. > > On Mon, Jul 18, 2011 at 10:49 PM, Ted Dunning <[email protected]> > wrote: > > So argued. The output should be a pair and a count and the pair should be > the key. Or the output should be a named vector containing keys and indexed > by keys (requires a dictionary). Either form allows a combiner. > > > > Sent from my iPhone > > > > On Jul 18, 2011, at 14:41, Sean Owen <[email protected]> wrote: > > > >> Yes, but the output of the phase in question is *not* a count. It > >> can't be combined. > >> You could argue that this is the problem! > > >
