[ 
https://issues.apache.org/jira/browse/SPARK-11302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974982#comment-14974982
 ] 

Sean Owen commented on SPARK-11302:
-----------------------------------

OK I reproduced all of this, thank you. This is roughly the code you can use to 
see the very large logpdf for this value:

{code}
import breeze.linalg.{DenseMatrix => BDM, Matrix => BM, DenseVector => BDV, 
SparseVector => BSV, Vector => BV, diag, max, eigSym}

val breezeMu = new 
BDV(Array(1055.3910505836575,1070.489299610895,1.39020554474708,1040.5907503867697))

val breezeSigma = new BDM(4, 4, Array(166769.00466698944, 169336.6705268059, 
12.820670788921873, 164243.93314092053, 169336.6705268059, 172041.5670061245, 
21.62590020524533, 166678.01075856484, 12.820670788921873, 21.62590020524533, 
0.872524191943962, 4.283255814732373, 164243.93314092053, 166678.01075856484, 
4.283255814732373, 161848.9196719207))

val EPSILON = {
    var eps = 1.0
    while ((1.0 + (eps / 2.0)) != 1.0) {
      eps /= 2.0
    }
    eps
  }

val eigSym.EigSym(d, u2) = eigSym(breezeSigma)
val tol = EPSILON * max(d) * d.length
val logPseudoDetSigma = d.activeValuesIterator.filter(_ > tol).map(math.log).sum
val pinvS = diag(new BDV(d.map(v => if (v > tol) math.sqrt(1.0 / v) else 
0.0).toArray))

val (rootSigmaInv: BDM[Double], u: Double) = (pinvS * u2, -0.5 * (breezeMu.size 
* math.log(2.0 * math.Pi) + logPseudoDetSigma))

val x = new BDV(Array(629,640,1.7188,618.19))

val delta = x - breezeMu
val v = rootSigmaInv * delta
u + v.t * v * -0.5
{code}

The problem is the clever trick here to compute, well, delta' * inv(sigma) * 
delta by computing (inv(sigma) * delta)' * (inv(sigma) * delta). The square 
root bit loses too much precision in a case like this.

I think it's pretty easy to avoid entirely. There's no great reason not to 
return u and inv(sigma) directly and compute this in the straightforward way.

>  Multivariate Gaussian Model with Covariance  matrix return zero always 
> ------------------------------------------------------------------------
>
>                 Key: SPARK-11302
>                 URL: https://issues.apache.org/jira/browse/SPARK-11302
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>            Reporter: eyal sharon
>            Priority: Minor
>
> I have been trying to apply an Anomaly Detection model  using Spark MLib. 
> As an input, I feed the model with a mean vector and a Covariance matrix. 
> ,assuming my features contain Co-variance.
> Here are my input for the  model ,and the model returns zero for each data 
> point for this input.
> MU vector - 
> 1054.8, 1069.8, 1.3 ,1040.1
> Cov' matrix - 
> 165496.0 , 167996.0,  11.0 , 163037.0  
> 167996.0,  170631.0,  19.0,  165405.0  
> 11.0,           19.0 ,         0.0,   2.0       
> 163037.0,   165405.0     2.0 ,  160707.0 
> Conversely,  for the  non covariance case, represented by  this matrix ,the 
> model is working and returns results as expected 
> 165496.0,  0.0 ,           0.0,   0.0                 
> 0.0,           170631.0,   0.0,   0.0                 
> 0.0 ,           0.0 ,           0.8,   0.0                 
> 0.0 ,           0.0,            0.0,  160594.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to