[ https://issues.apache.org/jira/browse/SPARK-11302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974982#comment-14974982 ]
Sean Owen commented on SPARK-11302: ----------------------------------- OK I reproduced all of this, thank you. This is roughly the code you can use to see the very large logpdf for this value: {code} import breeze.linalg.{DenseMatrix => BDM, Matrix => BM, DenseVector => BDV, SparseVector => BSV, Vector => BV, diag, max, eigSym} val breezeMu = new BDV(Array(1055.3910505836575,1070.489299610895,1.39020554474708,1040.5907503867697)) val breezeSigma = new BDM(4, 4, Array(166769.00466698944, 169336.6705268059, 12.820670788921873, 164243.93314092053, 169336.6705268059, 172041.5670061245, 21.62590020524533, 166678.01075856484, 12.820670788921873, 21.62590020524533, 0.872524191943962, 4.283255814732373, 164243.93314092053, 166678.01075856484, 4.283255814732373, 161848.9196719207)) val EPSILON = { var eps = 1.0 while ((1.0 + (eps / 2.0)) != 1.0) { eps /= 2.0 } eps } val eigSym.EigSym(d, u2) = eigSym(breezeSigma) val tol = EPSILON * max(d) * d.length val logPseudoDetSigma = d.activeValuesIterator.filter(_ > tol).map(math.log).sum val pinvS = diag(new BDV(d.map(v => if (v > tol) math.sqrt(1.0 / v) else 0.0).toArray)) val (rootSigmaInv: BDM[Double], u: Double) = (pinvS * u2, -0.5 * (breezeMu.size * math.log(2.0 * math.Pi) + logPseudoDetSigma)) val x = new BDV(Array(629,640,1.7188,618.19)) val delta = x - breezeMu val v = rootSigmaInv * delta u + v.t * v * -0.5 {code} The problem is the clever trick here to compute, well, delta' * inv(sigma) * delta by computing (inv(sigma) * delta)' * (inv(sigma) * delta). The square root bit loses too much precision in a case like this. I think it's pretty easy to avoid entirely. There's no great reason not to return u and inv(sigma) directly and compute this in the straightforward way. > Multivariate Gaussian Model with Covariance matrix return zero always > ------------------------------------------------------------------------ > > Key: SPARK-11302 > URL: https://issues.apache.org/jira/browse/SPARK-11302 > Project: Spark > Issue Type: Bug > Components: MLlib > Reporter: eyal sharon > Priority: Minor > > I have been trying to apply an Anomaly Detection model using Spark MLib. > As an input, I feed the model with a mean vector and a Covariance matrix. > ,assuming my features contain Co-variance. > Here are my input for the model ,and the model returns zero for each data > point for this input. > MU vector - > 1054.8, 1069.8, 1.3 ,1040.1 > Cov' matrix - > 165496.0 , 167996.0, 11.0 , 163037.0 > 167996.0, 170631.0, 19.0, 165405.0 > 11.0, 19.0 , 0.0, 2.0 > 163037.0, 165405.0 2.0 , 160707.0 > Conversely, for the non covariance case, represented by this matrix ,the > model is working and returns results as expected > 165496.0, 0.0 , 0.0, 0.0 > 0.0, 170631.0, 0.0, 0.0 > 0.0 , 0.0 , 0.8, 0.0 > 0.0 , 0.0, 0.0, 160594.2 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org