[
https://issues.apache.org/jira/browse/MAHOUT-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman updated MAHOUT-30:
-------------------------------
Attachment: MAHOUT-30.patch
Here's a work-in-progress Dirichlet Process Clustering algorithm that Ted
Dunning has been coaching me to write. It originated from an R prototype which
he wrote and which I translated into Java using our nascent Vector package.
Then Ted took the Java and refactored it to introduce better abstractions, as
the R implementation had no objects and my reverse-engineered abstractions were
rather clunky. Finally, I got the new implementation working with a pluggable
distributions framework based either upon commons-math (+ Ted's patches
thereto) or the blog-0.2 framework (vanilla).
I am posting this to the list in hopes of generating more interest from the
larger Mahout community. It has taken me literally months to wrap my mind
around this approach. Enjoy.
To run this patch you will need to get the blog package at
http://people.csail.mit.edu/milch/blog. Here is the beginning of the README
file that came with the distribution:
=====
Bayesian Logic (BLOG) Inference Engine version 0.2
Copyright (c) 2007, Massachusetts Institute of Technology
Copyright (c) 2005, 2006, Regents of the University of California
All rights reserved. This software is distributed under the license
included in LICENSE.txt.
Lead author: Brian Milch, [EMAIL PROTECTED]
Supervisors: Prof. Stuart Russell (Berkeley), Prof. Leslie Kaelbling (MIT)
Contributors: Bhaskara Marthi, Andrey Kolobov, David Sontag, Daniel L. Ong,
Brendan Clark
=====
Jeff
> dirichlet process implementation
> --------------------------------
>
> Key: MAHOUT-30
> URL: https://issues.apache.org/jira/browse/MAHOUT-30
> Project: Mahout
> Issue Type: New Feature
> Components: Clustering
> Reporter: Isabel Drost
> Attachments: MAHOUT-30.patch
>
>
> Copied over from original issue:
> > Further extension can also be made by assuming an infinite mixture model.
> > The implementation is only slightly more difficult and the result is a
> > (nearly)
> > non-parametric clustering algorithm.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.