Hello, I am new to the Apache Spark project but I would like to contribute to issue SPARK-7210. There has been come conversation on that issue and I would like to take a shot at it. Before doing so, I want to run my plan by everyone.
>From the description and the comments, the goal is to test other methods of computing the MVN pdf. The stated concern is that the SVD used is slow despite it being numerically stable, and that speed and stability may become problematic as the number of features grow. In the comments, Feynman posted an R recipe for computing the pdf using a Cholesky trick. I would like to compute the pdf by following that recipe while using the Cholesky implementation found in Scalanlp Breeze. To test speed I would estimate the pdf using the original method and the Cholesky method across a range of simulated datasets with growing n and p. To test stability I would estimate the pdf on simulated features with some multicollinearity. Does this sound like a good starting point? Am I thinking of this correctly? Given that this is my first attempt at contributing to an Apache project, might it be a good idea to do this through the Mentor Programme? Please let me know how this sounds, and I can provide some personal details about my experience and motivations. Thanks, Chris