[ 
https://issues.apache.org/jira/browse/SPARK-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Ren updated SPARK-18581:
----------------------------
    Description: 
When training GaussianMixtureModel, I found some probability much larger than 
1. That leads me to that fact that, the value returned by 
MultivariateGaussian.pdf can be 10^5, etc.

After reviewing the code, I found that problem lies in the computation of 
determinant of the covariance matrix.

The computation is simplified by using pseudo-determinant of a positive defined 
matrix. However, if the eigen value is all between 0 and 1, 
log(pseudo-determinant) will be a negative number like,  -50. As a result, the 
logpdf becomes positive (pdf > 1)

The related code that the following:

// In function: MultivariateGaussian.calculateCovarianceConstants()

{code}
val logPseudoDetSigma = d.activeValuesIterator.filter(_ > tol).map(math.log).sum
{code}

d is the eigen value vector here. If lots of its elements are between 0 and 1, 
then logPseudoDetSigma could be negative.



  was:
When training GaussianMixtureModel, I found some probability much larger than 
1. That leads me to that fact that, the value returned by 
MultivariateGaussian.pdf can be 10^5, etc.

After reviewing the code, I found that problem lies in the computation of 
determinant of the covariance matrix.

The computation is simplified by using pseudo-determinant of a positive defined 
matrix. However, if the eigen value is all between 0 and 1, 
log(pseudo-determinant) will be a negative number like,  -50. As a result, the 
logpdf becomes positive (pdf > 1)

The related code that the following:

// In function: MultivariateGaussian.calculateCovarianceConstants()

{code}
val logPseudoDetSigma = d.activeValuesIterator.filter(_ > tol).map(math.log).sum
{code}

d is the eigen value vector here. If lots of its elements are between 0 and 1, 
then logPseudoDetSigma could be negative.

Maybe we should just use the breeze 'det' opertion on sigma to get the right 
but slow answer instead of a quick, wrong one.


> MultivariateGaussian does not check if covariance matrix is invertible
> ----------------------------------------------------------------------
>
>                 Key: SPARK-18581
>                 URL: https://issues.apache.org/jira/browse/SPARK-18581
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.6.2, 2.0.2
>            Reporter: Hao Ren
>
> When training GaussianMixtureModel, I found some probability much larger than 
> 1. That leads me to that fact that, the value returned by 
> MultivariateGaussian.pdf can be 10^5, etc.
> After reviewing the code, I found that problem lies in the computation of 
> determinant of the covariance matrix.
> The computation is simplified by using pseudo-determinant of a positive 
> defined matrix. However, if the eigen value is all between 0 and 1, 
> log(pseudo-determinant) will be a negative number like,  -50. As a result, 
> the logpdf becomes positive (pdf > 1)
> The related code that the following:
> // In function: MultivariateGaussian.calculateCovarianceConstants()
> {code}
> val logPseudoDetSigma = d.activeValuesIterator.filter(_ > 
> tol).map(math.log).sum
> {code}
> d is the eigen value vector here. If lots of its elements are between 0 and 
> 1, then logPseudoDetSigma could be negative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to