[
https://issues.apache.org/jira/browse/SPARK-11302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974982#comment-14974982
]
Sean Owen commented on SPARK-11302:
-----------------------------------
OK I reproduced all of this, thank you. This is roughly the code you can use to
see the very large logpdf for this value:
{code}
import breeze.linalg.{DenseMatrix => BDM, Matrix => BM, DenseVector => BDV,
SparseVector => BSV, Vector => BV, diag, max, eigSym}
val breezeMu = new
BDV(Array(1055.3910505836575,1070.489299610895,1.39020554474708,1040.5907503867697))
val breezeSigma = new BDM(4, 4, Array(166769.00466698944, 169336.6705268059,
12.820670788921873, 164243.93314092053, 169336.6705268059, 172041.5670061245,
21.62590020524533, 166678.01075856484, 12.820670788921873, 21.62590020524533,
0.872524191943962, 4.283255814732373, 164243.93314092053, 166678.01075856484,
4.283255814732373, 161848.9196719207))
val EPSILON = {
var eps = 1.0
while ((1.0 + (eps / 2.0)) != 1.0) {
eps /= 2.0
}
eps
}
val eigSym.EigSym(d, u2) = eigSym(breezeSigma)
val tol = EPSILON * max(d) * d.length
val logPseudoDetSigma = d.activeValuesIterator.filter(_ > tol).map(math.log).sum
val pinvS = diag(new BDV(d.map(v => if (v > tol) math.sqrt(1.0 / v) else
0.0).toArray))
val (rootSigmaInv: BDM[Double], u: Double) = (pinvS * u2, -0.5 * (breezeMu.size
* math.log(2.0 * math.Pi) + logPseudoDetSigma))
val x = new BDV(Array(629,640,1.7188,618.19))
val delta = x - breezeMu
val v = rootSigmaInv * delta
u + v.t * v * -0.5
{code}
The problem is the clever trick here to compute, well, delta' * inv(sigma) *
delta by computing (inv(sigma) * delta)' * (inv(sigma) * delta). The square
root bit loses too much precision in a case like this.
I think it's pretty easy to avoid entirely. There's no great reason not to
return u and inv(sigma) directly and compute this in the straightforward way.
> Multivariate Gaussian Model with Covariance matrix return zero always
> ------------------------------------------------------------------------
>
> Key: SPARK-11302
> URL: https://issues.apache.org/jira/browse/SPARK-11302
> Project: Spark
> Issue Type: Bug
> Components: MLlib
> Reporter: eyal sharon
> Priority: Minor
>
> I have been trying to apply an Anomaly Detection model using Spark MLib.
> As an input, I feed the model with a mean vector and a Covariance matrix.
> ,assuming my features contain Co-variance.
> Here are my input for the model ,and the model returns zero for each data
> point for this input.
> MU vector -
> 1054.8, 1069.8, 1.3 ,1040.1
> Cov' matrix -
> 165496.0 , 167996.0, 11.0 , 163037.0
> 167996.0, 170631.0, 19.0, 165405.0
> 11.0, 19.0 , 0.0, 2.0
> 163037.0, 165405.0 2.0 , 160707.0
> Conversely, for the non covariance case, represented by this matrix ,the
> model is working and returns results as expected
> 165496.0, 0.0 , 0.0, 0.0
> 0.0, 170631.0, 0.0, 0.0
> 0.0 , 0.0 , 0.8, 0.0
> 0.0 , 0.0, 0.0, 160594.2
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]