[ 
https://issues.apache.org/jira/browse/SPARK-11302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14973325#comment-14973325
 ] 

eyal sharon commented on SPARK-11302:
-------------------------------------

Hi Sean,

Thanks for your reply. I will try to add more info

 - I'm using a Multivariate Gaussian for anomaly detection. I'm using this
source from Mlib -   MultivariateGaussian
<https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/stat/distribution/MultivariateGaussian.scala>

This library enables to create a Gaussian instance and to feed it with new
data point (which is a dense vector )  to return the probability.
Now ,when I run my code over, it always returns zero

- I checked my code using this example implantation I have found on GIT example
for anomaly detection  <https://github.com/vivanov/anomaly-detection>
Note the this example uses a *non covariance* matrix, If you run this code
with a full  covariance matrix, the PDF function will always return zero.

  To check to covariance case , here is a function which takes a  data set
(mat) with features and a corresponding mean  vector (mu) :

def createCovSigma(mat: DenseMatrix,mu: Vector) : DenseMatrix = {

  val rowsInArray = mat.transpose.toArray.grouped(mat.numCols).toArray
  val sigmaSubMU = rowsInArray.map(row => {(row.toList zip
mu.toArray).map(elem=>elem._1-elem._2)}.toArray )

  val checkArray = sigmaSubMU.flatMap(row=>row)

  val mat2 = new DenseMatrix(mat.numRows, mat.numCols,checkArray,true)
  val sigmaTmp: DenseMatrix = mat2.transpose.multiply(mat2)
  val sigmaTmpArray=sigmaTmp.toArray
  val sigmaMatrix: DenseMatrix =  new DenseMatrix(mat.numCols,
mat.numCols, sigmaTmpArray.flatMap(x=>List(x/mat.numRows)),true)

  sigmaMatrix
}


 If you need me to add more info I will

Thanks!

Eyal





-



























-- 


*This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are 
addressed. Please note that any disclosure, copying or distribution of the 
content of this information is strictly forbidden. If you have received 
this email message in error, please destroy it immediately and notify its 
sender.*


>  Multivariate Gaussian Model with Covariance  matrix return zero always 
> ------------------------------------------------------------------------
>
>                 Key: SPARK-11302
>                 URL: https://issues.apache.org/jira/browse/SPARK-11302
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>            Reporter: eyal sharon
>            Priority: Minor
>
> I have been trying to apply an Anomaly Detection model  using Spark MLib. 
> As an input, I feed the model with a mean vector and a Covariance matrix. 
> ,assuming my features contain Co-variance.
> Here are my input for the  model ,and the model returns zero for each data 
> point for this input.
> MU vector - 
> 1054.8, 1069.8, 1.3 ,1040.1
> Cov' matrix - 
> 165496.0 , 167996.0,  11.0 , 163037.0  
> 167996.0,  170631.0,  19.0,  165405.0  
> 11.0,           19.0 ,         0.0,   2.0       
> 163037.0,   165405.0     2.0 ,  160707.0 
> Conversely,  for the  non covariance case, represented by  this matrix ,the 
> model is working and returns results as expected 
> 165496.0,  0.0 ,           0.0,   0.0                 
> 0.0,           170631.0,   0.0,   0.0                 
> 0.0 ,           0.0 ,           0.8,   0.0                 
> 0.0 ,           0.0,            0.0,  160594.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to