the contents of the 2 x 2 matrix are more easily understood if we augment it
with row and column sums:
1and2 1not2 | preferring1
not1and2 not1not2 | not1
---------------------+------------
preferring2 not2 | all
So out of this, 1not2 = preferring1 - 1and2 and not1and2 =
preferring2-1and2 and not2 = all - preferring2 and not1not2 =
not2 - 1not2 = all - preferring1 - preferring2 + 1and2
Thus I think your code should be:
double logLikelihood = twoLogLambda(preferring1and2,
preferring1 - preferring1and2,
preferring2 - preferring1and2,
numUsers - preferring1 - preferring2
+ preferring1and2);
I find it easier to understand the twoLogLambda code if it is written this
way:
double twoLogLambda(k11, k12, k21, k22) {
return 2 * ( kLogP(k11, k12, k21, k22) - kLogP(k11+k12, k21+k22)
- kLogP(k11+k21,
k12+k22) )
}
double kLogP(int... values) {
double total = 0;
for (int x : values) {
total += x;
}
double result = 0;
for (int x : values) {
if (x > 0) {
result += k * Math.log(k / total);
}
}
return result;
}
We have code in Mahout that does something like this. It is also
essentially the same as the R code I gave earlier.
On Sat, Jan 29, 2011 at 11:09 AM, Sean Owen <[email protected]> wrote:
> Maybe the formulation in the code now is slightly wrong. The key math is:
>
> double logLikelihood = twoLogLambda(preferring1and2,
> preferring1 - preferring1and2,
> preferring2,
> numUsers - preferring2);
>
>
> static double twoLogLambda(double k1, double k2, double n1, double n2) {
> double p = (k1 + k2) / (n1 + n2);
> return 2.0 * (logL(k1 / n1, k1, n1)
> + logL(k2 / n2, k2, n2)
> - logL(p, k1, n1)
> - logL(p, k2, n2));
> }
>
> private static double logL(double p, double k, double n) {
> return k * Math.log(p) + (n - k) * Math.log(1.0 - p);
> }
>