[ 
https://issues.apache.org/jira/browse/SPARK-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15695552#comment-15695552
 ] 

Sean Owen commented on SPARK-18581:
-----------------------------------

Yes, but it need not be invertible, for the reason you give. It looks like it 
handles this in the code. pinvS is a pseudo-inverse of the eigenvalue diagonal 
matrix, which can have zeroes.

Backing up though, I re-read and see you're saying you get a PDF > 1, but, 
that's perfectly normal. PDF does not need to be <= 1.

Are you, however, saying you observe a big numeric inaccuracy in this case?

> MultivariateGaussian does not check if covariance matrix is invertible
> ----------------------------------------------------------------------
>
>                 Key: SPARK-18581
>                 URL: https://issues.apache.org/jira/browse/SPARK-18581
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.6.2, 2.0.2
>            Reporter: Hao Ren
>
> When training GaussianMixtureModel, I found some probability much larger than 
> 1. That leads me to that fact that, the value returned by 
> MultivariateGaussian.pdf can be 10^5, etc.
> After reviewing the code, I found that problem lies in the computation of 
> determinant of the covariance matrix.
> The computation is simplified by using pseudo-determinant of a positive 
> defined matrix. 
> In my case, I have a feature = 0 for all data point.
> As a result, covariance matrix is not invertible <=> det(covariance matrix) = 
> 0 => pseudo-determinant will be very close to zero,
> Thus, log(pseudo-determinant) will be a large negative number which finally 
> make logpdf very biger, pdf will be even bigger > 1.
> As said in comments of MultivariateGaussian.scala, 
> """
> Singular values are considered to be non-zero only if they exceed a tolerance 
> based on machine precision.
> """
> But if a singular value is considered to be zero, means the covariance matrix 
> is non invertible which is a contradiction to the assumption that it should 
> be invertible.
> So we should check if there a single value is smaller than the tolerance 
> before computing the pseudo determinant



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to