[ 
https://issues.apache.org/jira/browse/SPARK-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612291#comment-14612291
 ] 

Chris Harvey commented on SPARK-7210:
-------------------------------------

I am new to the Apache Spark project but I would like to contribute to this 
issue. 

Feynman posted an R recipe for computing the pdf using a Cholesky trick. I 
would like to compute the pdf by following that recipe while using the Cholesky 
implementation found in Scalanlp Breeze. To test speed I would estimate the pdf 
using the original method and the Cholesky method across a range of simulated 
datasets with growing n and p. To test stability I would estimate the pdf on 
simulated features with some multicollinearity. 

Does this sound like a good starting point? Given that this is my first attempt 
at contributing to an Apache project, might it be a good idea to do this 
through the Mentor Programme? 

> Test matrix decompositions for speed vs. numerical stability for Gaussians
> --------------------------------------------------------------------------
>
>                 Key: SPARK-7210
>                 URL: https://issues.apache.org/jira/browse/SPARK-7210
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Joseph K. Bradley
>            Priority: Minor
>
> We currently use SVD for inverting the Gaussian's covariance matrix and 
> computing the determinant.  SVD is numerically stable but slow.  We could 
> experiment with Cholesky, etc. to figure out a better option, or a better 
> option for certain settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to