Jeff Eastman wrote:
Ted Dunning wrote:
This could also be caused if the prior is very diffuse. This makes the
probability that a point will go to any new cluster quite low. You can
compensate somewhat for this with different values of alpha.
Could you elaborate more on the function of alpha in the algorithm?
Looking at the current implementation, it is only used to initialize
the totalCount values (to alpha/k) when sampling from the prior.
AFAICT it is not used anywhere else. Its current role is pretty
minimal and I wonder if something fell through the cracks during all
of the refactoring from the R prototype.
Well, I looked over the R code and alpha_0 does appear to be used in two
places, not one:
- in state initialization "beta = rbeta(K, 1, alpha_0)" [K is the number
of models]
- during state update "beta[k] = rbeta(1, 1 + counts[k], alpha_0 +
N-counts[k])" [N is the cardinality of the sample vector and counts
corresponds to totalCounts in the implementation]
The value of beta[k] is then used in the Dirichlet distribution
calculation which results in the mixture probabilities pi[i], for the
iteration:
other = 1 # product accumulator
for (k in 1:K) {
pi[k] = beta[k] * other; # beta_k * prod_{n<k}
beta_n
other = other * (1-beta[k])
}
Alpha_0 does not appear to ever be added to the total counts nor is it
divided by K as in the implementation so it looks like something did get
lost in the refactoring. In the implementation,
UncommonDistributions.rDirichlet(Vector alpha) is passed the totalCounts
to compute the mixture probabilities and the rBeta arguments do not use
alpha_0 as in R. There are other differences; however, and rDirichlet
looks like:
public static Vector rDirichlet(Vector alpha) {
Vector r = alpha.like();
double total = alpha.zSum();
double remainder = 1;
for (int i = 0; i < r.size(); i++) {
double a = alpha.get(i);
total -= a;
double beta = rBeta(a, Math.max(0, total));
double p = beta * remainder;
r.set(i, p);
remainder -= p;
}
return r;
}