[
https://issues.apache.org/jira/browse/SPARK-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233107#comment-14233107
]
Travis Galoppo commented on SPARK-4156:
---------------------------------------
I have modified the cluster initialization strategy to derive an initial
covariance matrix from the sample points used to initialize the clusters; this
initial covariance matrix has the element-wise variance of the sample points on
the diagonal. The final computed covariance matrix is not constrained to be
diagonal.
I tested this with the S1 dataset [~MeethuMathew] referenced above; while it
does "fix" the problem of effectively finding no clusters, I find that the
results are still better when the input is scaled as I mentioned above. It
might be worthwhile to allow the user to provide a pre-initialized model to
accomodate various initialization strategies, and provide the current
functionality as a default. Thoughts?
Also, I have fixed the defect in DenseGmmEM whereby it was ignoring the delta
parameter.
> Add expectation maximization for Gaussian mixture models to MLLib clustering
> ----------------------------------------------------------------------------
>
> Key: SPARK-4156
> URL: https://issues.apache.org/jira/browse/SPARK-4156
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Travis Galoppo
> Assignee: Travis Galoppo
>
> As an additional clustering algorithm, implement expectation maximization for
> Gaussian mixture models
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]