On Mon, Feb 18, 2008 at 5:20 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> No problems yet. I've been doing some background reading, etc. and > haven't had much coding time. Biggest issue is finding time, if you > want to seed something, by all means go for it with a reference > implementation. We can make it M/R from there. If you speak matlab, here are implementations for multi-variate Bernoulli (binary) and multinomial (count) models: http://people.csail.mit.edu/jrennie/matlab/nbBinary.m http://people.csail.mit.edu/jrennie/matlab/nbMulti.m If you don't speak matlab, it might be worth trying to learn the basics (note that "octave" is a free/open source version of matlab---just make sure to use a recent release---2.9.x or later). ML implementations are often most efficiently implemented in matrix form. Matlab is quite popular in the ML community. Here's a quickie tutorial I wrote for an ML class I TA'd: http://www.ai.mit.edu/courses/6.891-f00/matlab/matrix.m Additional resources: http://www.ai.mit.edu/courses/6.891-f00/matlab.html We really ought to get a sparse matrix library of some sort in place before all these algorithms are implemented (even an interface would be helpful). Otherwise, many of them will have to be largely re-written to make them efficient. And, I'm guessing that making the matrix library M/R compatible is the hard part. Once that's in place, we can just port serial implementations to the M/R matrix lib and viola---parallel ML algorithms galore :) 'course, maybe I'm getting ahead of myself... There is no code checked in yet. I am about to commit MAHOUT-3 soon > (clustering, not NB), though, which will likely be the first commit of > any code. > What does one have to do to get access to the SVN repo? Jason
