[ https://issues.apache.org/jira/browse/SPARK-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696056#comment-15696056 ]
Hao Ren commented on SPARK-18581: --------------------------------- Thank you for the clarification. I totally missed that part. I will compare the result to R. > MultivariateGaussian does not check if covariance matrix is invertible > ---------------------------------------------------------------------- > > Key: SPARK-18581 > URL: https://issues.apache.org/jira/browse/SPARK-18581 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 1.6.2, 2.0.2 > Reporter: Hao Ren > > When training GaussianMixtureModel, I found some probability much larger than > 1. That leads me to that fact that, the value returned by > MultivariateGaussian.pdf can be 10^5, etc. > After reviewing the code, I found that problem lies in the computation of > determinant of the covariance matrix. > The computation is simplified by using pseudo-determinant of a positive > defined matrix. > In my case, I have a feature = 0 for all data point. > As a result, covariance matrix is not invertible <=> det(covariance matrix) = > 0 => pseudo-determinant will be very close to zero, > Thus, log(pseudo-determinant) will be a large negative number which finally > make logpdf very biger, pdf will be even bigger > 1. > As said in comments of MultivariateGaussian.scala, > """ > Singular values are considered to be non-zero only if they exceed a tolerance > based on machine precision. > """ > But if a singular value is considered to be zero, means the covariance matrix > is non invertible which is a contradiction to the assumption that it should > be invertible. > So we should check if there a single value is smaller than the tolerance > before computing the pseudo determinant -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org