[jira] Commented: (MAHOUT-30) dirichlet process implementation

Isabel Drost (JIRA) Thu, 25 Dec 2008 17:27:10 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659218#action_12659218
 ]


Isabel Drost commented on MAHOUT-30:
------------------------------------

First of all: Great work, Jeff. I finally found some time to have a closer look 
at the code. To me it already looks pretty clear and easy to understand. Some 
minor comments:

I did not have a close look at the code for displaying the clustering process 
so far. If it is to be retained in the final version it might be a good idea to 
move that into its own package?

I was wondering why you wrapped two math packages (blog and commons-math). 
Maybe it helps if you shortly name advantages and shortcomings of either? I was 
missing a pointer to Ted's patch to commons math. Maybe I just overlooked it?

In the patch I am missing changes to the dependencies of the pom.xml. I guess 
you would check in the libraries the patch is depending on into our libs 
directory? On first sight the license of BLOG as well its dependencies seems 
fine, any one else verified this?

Me personally, I would love to see the mathematical fomulae behind the 
implementation in the docs as well, maybe a pointer to a book chapter/ 
publication or other source that explains the algorithm in more detail.

Looking through the code, I found it a little irritating to have classes 
ModelDistribution, NormalModelDistribution, and DirichletDistribution around. 
As the only method in the interface ModelDistribution is sampleFromPrior, it 
might be clearer if it were named ModelSampler? But maybe it is just the time 
of day I looked at it...



> dirichlet process implementation
> --------------------------------
>
>                 Key: MAHOUT-30
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-30
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>            Reporter: Isabel Drost
>            Assignee: Jeff Eastman
>         Attachments: MAHOUT-30.patch, MAHOUT-30b.patch, MAHOUT-30c.patch
>
>
> Copied over from original issue:
> > Further extension can also be made by assuming an infinite mixture model. 
> > The implementation is only slightly more difficult and the result is a 
> > (nearly)
> > non-parametric clustering algorithm.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-30) dirichlet process implementation

Reply via email to