[ 
https://issues.apache.org/jira/browse/MAHOUT-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Eastman updated MAHOUT-30:
-------------------------------

    Attachment: MAHOUT-30.patch

Here's a work-in-progress Dirichlet Process Clustering algorithm that Ted 
Dunning has been coaching me to write. It originated from an R prototype which 
he wrote and which I translated into Java using our nascent Vector package. 
Then Ted took the Java and refactored it to introduce better abstractions, as 
the R implementation had no objects and my reverse-engineered abstractions were 
rather clunky. Finally, I got the new implementation working with a pluggable 
distributions framework based either upon commons-math (+ Ted's patches 
thereto) or the blog-0.2 framework (vanilla).

I am posting this to the list in hopes of generating more interest from the 
larger Mahout community. It has taken me literally months to wrap my mind 
around this approach. Enjoy.

To run this patch you will need to get the blog package at 
http://people.csail.mit.edu/milch/blog. Here is the beginning of the README 
file that came with the distribution:
=====
Bayesian Logic (BLOG) Inference Engine version 0.2

Copyright (c) 2007, Massachusetts Institute of Technology
Copyright (c) 2005, 2006, Regents of the University of California
All rights reserved.  This software is distributed under the license 
included in LICENSE.txt.

Lead author: Brian Milch, [EMAIL PROTECTED]
Supervisors: Prof. Stuart Russell (Berkeley), Prof. Leslie Kaelbling (MIT)
Contributors: Bhaskara Marthi, Andrey Kolobov, David Sontag, Daniel L. Ong,
    Brendan Clark
=====

Jeff

> dirichlet process implementation
> --------------------------------
>
>                 Key: MAHOUT-30
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-30
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Isabel Drost
>         Attachments: MAHOUT-30.patch
>
>
> Copied over from original issue:
> > Further extension can also be made by assuming an infinite mixture model. 
> > The implementation is only slightly more difficult and the result is a 
> > (nearly)
> > non-parametric clustering algorithm.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to