On Mon, Feb 18, 2008 at 5:20 PM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:

> No problems yet.  I've been doing some background reading, etc. and
> haven't had much coding time.  Biggest issue is finding time, if you
> want to seed something, by all means go for it with a reference
> implementation.  We can make it M/R from there.


If you speak matlab, here are implementations for multi-variate Bernoulli
(binary) and multinomial (count) models:

http://people.csail.mit.edu/jrennie/matlab/nbBinary.m
http://people.csail.mit.edu/jrennie/matlab/nbMulti.m

If you don't speak matlab, it might be worth trying to learn the basics
(note that "octave" is a free/open source version of matlab---just make sure
to use a recent release---2.9.x or later).  ML implementations are often
most efficiently implemented in matrix form.  Matlab is quite popular in the
ML community.  Here's a quickie tutorial I wrote for an ML class I TA'd:

http://www.ai.mit.edu/courses/6.891-f00/matlab/matrix.m

Additional resources:

http://www.ai.mit.edu/courses/6.891-f00/matlab.html

We really ought to get a sparse matrix library of some sort in place before
all these algorithms are implemented (even an interface would be helpful).
Otherwise, many of them will have to be largely re-written to make them
efficient.  And, I'm guessing that making the matrix library M/R compatible
is the hard part.  Once that's in place, we can just port serial
implementations to the M/R matrix lib and viola---parallel ML algorithms
galore :)  'course, maybe I'm getting ahead of myself...

There is no code checked in yet.  I am about to commit MAHOUT-3 soon
> (clustering, not NB), though, which will likely be the first commit of
> any code.
>


What does one have to do to get access to the SVN repo?

Jason

Reply via email to