I believe what the paper is advocating is that one outputs the partial weights
of the co-occurrences, already pre computed. Again, it's the difference
between emitting in the inner loop and the outer loop of the code below. I
gotta believe that is an order of magnitude reduction in the amount of stuff
that has to be sorted and shuffled and then reduced. But, it does preclude us
from supporting some similarity measures, I suppose.
<code>
for (int n = 0; n < weightedOccurrences.length; n++) {
int rowA = weightedOccurrences[n].getRow();
double weightA = weightedOccurrences[n].getWeight();
double valueA = weightedOccurrences[n].getValue();
for (int m = n; m < weightedOccurrences.length; m++) {
int rowB = weightedOccurrences[m].getRow();
double weightB = weightedOccurrences[m].getWeight();
double valueB = weightedOccurrences[m].getValue();
if (rowA <= rowB) {
rowPair.set(rowA, rowB, weightA, weightB);
coocurrence.set(column.get(), valueA, valueB);
} else {
rowPair.set(rowB, rowA, weightB, weightA);
coocurrence.set(column.get(), valueB, valueA);
}
ctx.write(rowPair, coocurrence); // INNER LOOP
numPairs++;
}
//VERSUS EMITTING HERE
}
</code>
-Grant
On Jul 18, 2011, at 5:47 PM, Sean Owen wrote:
> Completely agree; I had thought the suggestion was that the paper
> shows combining within one map invocation. I don't believe that's
> possible here since one map will output at most one value for any key.
>
> On Mon, Jul 18, 2011 at 10:42 PM, Ted Dunning <[email protected]> wrote:
>> The combiner works across multiple invocations of the map function and may
>> be applied on the reduce side as well.
--------------------------
Grant Ingersoll