[
https://issues.apache.org/jira/browse/MAHOUT-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12659218#action_12659218
]
Isabel Drost commented on MAHOUT-30:
------------------------------------
First of all: Great work, Jeff. I finally found some time to have a closer look
at the code. To me it already looks pretty clear and easy to understand. Some
minor comments:
I did not have a close look at the code for displaying the clustering process
so far. If it is to be retained in the final version it might be a good idea to
move that into its own package?
I was wondering why you wrapped two math packages (blog and commons-math).
Maybe it helps if you shortly name advantages and shortcomings of either? I was
missing a pointer to Ted's patch to commons math. Maybe I just overlooked it?
In the patch I am missing changes to the dependencies of the pom.xml. I guess
you would check in the libraries the patch is depending on into our libs
directory? On first sight the license of BLOG as well its dependencies seems
fine, any one else verified this?
Me personally, I would love to see the mathematical fomulae behind the
implementation in the docs as well, maybe a pointer to a book chapter/
publication or other source that explains the algorithm in more detail.
Looking through the code, I found it a little irritating to have classes
ModelDistribution, NormalModelDistribution, and DirichletDistribution around.
As the only method in the interface ModelDistribution is sampleFromPrior, it
might be clearer if it were named ModelSampler? But maybe it is just the time
of day I looked at it...
> dirichlet process implementation
> --------------------------------
>
> Key: MAHOUT-30
> URL: https://issues.apache.org/jira/browse/MAHOUT-30
> Project: Mahout
> Issue Type: New Feature
> Components: Clustering
> Reporter: Isabel Drost
> Assignee: Jeff Eastman
> Attachments: MAHOUT-30.patch, MAHOUT-30b.patch, MAHOUT-30c.patch
>
>
> Copied over from original issue:
> > Further extension can also be made by assuming an infinite mixture model.
> > The implementation is only slightly more difficult and the result is a
> > (nearly)
> > non-parametric clustering algorithm.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.