[
https://issues.apache.org/jira/browse/MATH-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494552#comment-13494552
]
Gilles commented on MATH-817:
-----------------------------
Quite a few remarks, in no particular order.
In "MultivariateNormalMixtureExpectationMaximizationFitter", wouldn't it be
possible to pass the distributions as arguments to the constructor (similar to
what was eventually done for "MixtureMultivariateRealDistribution")?
"(Math)Arrays.copyOf" is preferred over "System.arraycopy".
"equals" in "MultivariateNormalDistribution":
* Unneeded checks (means and covariance matrix arrays cannot be null)
* Comparing entries of the sampling matrix is perhaps enough to establish the
equivalence relation (?)
Why all the
{noformat}
@SuppressWarnings("unused")
{noformat}
in "MultivariateNormalMixtureExpectationMaximizationFitterTest"?
"MathArrays.copyOf" is preferred over "System.arraycopy".
It would be better to use "RealVector" and "RealMatrix" objects (in place of 1D
and 2D arrays) whenever appropriate. Their methods can make the code tighter
(e.g. avoiding many explicit loops).
At first sight, I don't think that defining "equals" and "hashCode" should be
mandatory (i.e. those methods should not be declared in the abstract base
class).
There is some odd code formatting.
Why two RNG? Wouldn't be clearer to define a list of "NormalDistribution"
objects from which to sample (instead of using "nextGaussian")?
Documentation is missing for the "@param" and "@return" tags.
The "stillFitting" flag could be avoided by using a label on the outer loop (in
method "fit").
"catch (Exception e)" is annoying.
I wonder whether the class would be more flexible if the "data" is not passed
to the constructor: the "fit" method could be "public" and would return the
fitted mixture.
Mixing data with different meanings in the same array is not a good idea:
{code}
final double[][] unsortedData = new double[data.length][numDataColumns + 1];
{code}
(where the entry at index "numDataColumns" will contain the mean of the values
at index 0 .. numDataColumns - 1). If you want to associate different data, put
them in a dedicated class e.g.:
{code}
private static class DataRow implements Comparable<DataRow> {
private double[] row;
private Double mean;
DataRow(double[] data) {
// Store reference.
row = data;
// Compute mean.
mean = ... ;
}
public int compareTo(DataRow other) {
return mean.compareTo(other.mean);
}
}
{code}
Then, the above line would become:
{code}
final DataRow[] unsortedData = new DataRow[data.length];
{code}
and you can use "Arrays.sort" without specifying a "Comparator" and without
copying data back and forth.
IIUC, the "fit" can return quietly from satisfying either of two criteria:
* convergence threshold
* number of iterations
but no exception is thrown to signal that the fit does not meet the convergence
criterion.
> Multivariate Normal Mixture Model Fitting by Expectation Maximization
> ---------------------------------------------------------------------
>
> Key: MATH-817
> URL: https://issues.apache.org/jira/browse/MATH-817
> Project: Commons Math
> Issue Type: New Feature
> Reporter: Jared Becksfort
> Priority: Minor
> Attachments: AbstractMultivariateRealDistribution.java.patch,
> MixtureMultivariateRealDistribution.java.patch,
> MultivariateNormalDistribution.java.patch,
> MultivariateNormalMixtureExpectationMaximizationFitter.java,
> MultivariateNormalMixtureExpectationMaximizationFitterTest.java
>
> Original Estimate: 1m
> Remaining Estimate: 1m
>
> I will submit a class for fitting Multivariate Normal Mixture Models using
> Expectation Maximization.
> > Hello,
> >
> > I have implemented some classes for multivariate Normal distributions,
> > multivariate normal mixture models, and an expectation maximization fitting
> > class for the mixture model. I would like to submit it to Apache Commons
> > Math. I still have some touching up to do so that they fit the style
> > guidelines and implement the correct interfaces. Before I do so, I thought
> > I would at least ask if the developers of the project are interested in me
> > submitting them.
> >
> > Thanks,
> > Jared Becksfort
> Dear Jared,
> Yes, that would be very nice to have such an addition! Remember to also
> include unit tests (refer to the current ones for examples). The best would
> be to split a submission up into multiple minor ones, each covering a natural
> submission (e.g. multivariate Normal distribution in one submission), and
> create an issue as described at
> http://commons.apache.org/math/issue-tracking.html .
> If you run into any problems, please do not hesitate to ask on this mailing
> list.
> Cheers, Mikkel.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira