[ 
https://issues.apache.org/jira/browse/SPARK-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233107#comment-14233107
 ] 

Travis Galoppo commented on SPARK-4156:
---------------------------------------

I have modified the cluster initialization strategy to derive an initial 
covariance matrix from the sample points used to initialize the clusters; this 
initial covariance matrix has the element-wise variance of the sample points on 
the diagonal.  The final computed covariance matrix is not constrained to be 
diagonal.

I tested  this with the S1 dataset [~MeethuMathew] referenced above; while it 
does "fix" the problem of effectively finding no clusters, I find that the 
results are still better when the input is scaled as I mentioned above.  It 
might be worthwhile to allow the user to provide a pre-initialized model to 
accomodate various initialization strategies, and provide the current 
functionality as a default. Thoughts?

Also, I have fixed the defect in DenseGmmEM whereby it was ignoring the delta 
parameter.


> Add expectation maximization for Gaussian mixture models to MLLib clustering
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-4156
>                 URL: https://issues.apache.org/jira/browse/SPARK-4156
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Travis Galoppo
>            Assignee: Travis Galoppo
>
> As an additional clustering algorithm, implement expectation maximization for 
> Gaussian mixture models



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to