I believe what the paper is advocating is that one outputs the partial weights 
of the co-occurrences, already pre computed.  Again, it's the difference 
between emitting in the inner loop and the outer loop of the code below.  I 
gotta believe that is an order of magnitude reduction in the amount of stuff 
that has to be sorted and shuffled and then reduced.  But, it does preclude us 
from supporting some similarity measures, I suppose.

<code>
for (int n = 0; n < weightedOccurrences.length; n++) {
       int rowA = weightedOccurrences[n].getRow();
       double weightA = weightedOccurrences[n].getWeight();
       double valueA = weightedOccurrences[n].getValue();
       for (int m = n; m < weightedOccurrences.length; m++) {
          int rowB = weightedOccurrences[m].getRow();
          double weightB = weightedOccurrences[m].getWeight();
          double valueB = weightedOccurrences[m].getValue();
          if (rowA <= rowB) {
            rowPair.set(rowA, rowB, weightA, weightB);
            coocurrence.set(column.get(), valueA, valueB);
          } else {
            rowPair.set(rowB, rowA, weightB, weightA);
            coocurrence.set(column.get(), valueB, valueA);
          }
          ctx.write(rowPair, coocurrence); // INNER LOOP
          numPairs++;
       }
//VERSUS EMITTING HERE
 }
</code>

-Grant

On Jul 18, 2011, at 5:47 PM, Sean Owen wrote:

> Completely agree; I had thought the suggestion was that the paper
> shows combining within one map invocation. I don't believe that's
> possible here since one map will output at most one value for any key.
> 
> On Mon, Jul 18, 2011 at 10:42 PM, Ted Dunning <[email protected]> wrote:
>> The combiner works across multiple invocations of the map function and may 
>> be applied on the reduce side as well.

--------------------------
Grant Ingersoll



Reply via email to