[ 
https://issues.apache.org/jira/browse/MATH-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496118#comment-13496118
 ] 

Gilles commented on MATH-817:
-----------------------------

bq. [...] public functions after initialization [...] 

Sorry for the misunderstanding, but again that's not what I mean. Those public 
functions would help the user to set up the necessary arguments _before_ 
initialization. It's _not_ the business of the optimization algorithm to figure 
out the initial guesses: whether those are chosen carefully or randomly or 
estimated from sample data, the algorithm, as such, starts its actual work with 
a fully defined mixture of Gaussian distributions.

A lean API also makes for clearer, more maintainable code; so we should strive 
to have some algorithm's implementation focus on its job. Helper utilities, 
such as initialization by estimation, or randomization or from incomplete 
specification, can come later, e.g. as subclasses or as utility functions.

bq. A method called setInitialMeans(double[][] initialMeans) [...]

Please don't do that. The steps (i.e. construction, initialization, number of 
methods calls) needed to perform some action and get a reliable result should 
be as few as possible, and among other things, it is better to consider that 
construction _is_ initialization, thereby removing the need of an additional 
initialization step (and the risk that this step if forgotten during usage).

Of course, there are mixed cases where there is no clear-cut separation between 
data that can be fixed at construction and arguments that can be passed to the 
object's methods.

A typical example would be the number of components which the fitted mixture 
should contain. Is it a parameter to be fixed at the fitter's construction?
{code}
public class EMFitter1 {
  final int numberOfComponents;

  public EMFitter1(int numComp) {
    numberOfComponents = numComp;
  }

  public MixtureMultivariateRealDistribution<MultivariateNormalDistribution> 
fit(double[][] data) {
    // Fit a mixture with "numberOfComponents" components.
  }
}
{code}
Or it could be an additional argument to the fit method?
{code}
public class EMFitter2 {
  public EMFitter2() {}

  public MixtureMultivariateRealDistribution<MultivariateNormalDistribution> 
fit(int numComp,
                                                                                
 double[][] data) {
    // Fit a mixture with "numComp" components.
  }
}
{code}
In this latter case, the rationale would be that the number of components is a 
"parameter" of the algorithm that should not require a new object.
But note that it is a matter of interpretation: in the case of "EMFitter1", 
there is an equally valid rationale in saying that an instance of the fitter 
encapsulates the fitting by a fixed number of components!

bq. [...] it may be OK and make for a clearer API to have some public members 
allowing specification of various initial estimates.

This approach already has the problem of letting users wonder what happens with 
the "initial covariances" when they call "setInitialMeans": Are the covariances 
set to random values, or kept to their previous values? What happens if there 
are no previous values?
Another problem is that it adds a number of steps and make the API more 
susceptible to wrong usage.
If you want to allow for multiple calls to "fit" with different parameters, we 
might want to use to same approach as we are implementing in the "optimization" 
package (with the new interface "OptimizationData"). Could you please have a 
look? [But please note that we went that "far"[1] in order to accommodate 
various algorithms that needed _different_ parameter types within the same API.]

[1] "OptimizationData" is just marker interface (i.e. with no functionality) 
and that's not something to be abused too much.
                
> Multivariate Normal Mixture Model Fitting by Expectation Maximization
> ---------------------------------------------------------------------
>
>                 Key: MATH-817
>                 URL: https://issues.apache.org/jira/browse/MATH-817
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Jared Becksfort
>            Priority: Minor
>         Attachments: AbstractMultivariateRealDistribution.java.patch, 
> MixtureMultivariateRealDistribution.java.patch, 
> MultivariateNormalDistribution.java.patch, 
> MultivariateNormalMixtureExpectationMaximizationFitter.java, 
> MultivariateNormalMixtureExpectationMaximizationFitterTest.java
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> I will submit a class for fitting Multivariate Normal Mixture Models using 
> Expectation Maximization.
> > Hello,
> >
> > I have implemented some classes for multivariate Normal distributions, 
> > multivariate normal mixture models, and an expectation maximization fitting 
> > class for the mixture model.  I would like to submit it to Apache Commons 
> > Math.  I still have some touching up to do so that they fit the style 
> > guidelines and implement the correct interfaces.  Before I do so, I thought 
> > I would at least ask if the developers of the project are interested in me 
> > submitting them.
> >
> > Thanks,
> > Jared Becksfort
> Dear Jared,
> Yes, that would be very nice to have such an addition! Remember to also 
> include unit tests (refer to the current ones for examples). The best would 
> be to split a submission up into multiple minor ones, each covering a natural 
> submission (e.g. multivariate Normal distribution in one submission), and 
> create an issue as described at 
> http://commons.apache.org/math/issue-tracking.html .
> If you run into any problems, please do not hesitate to ask on this mailing 
> list.
> Cheers, Mikkel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to