Hello,

I am new to the Apache Spark project but I would like to contribute to
issue SPARK-7210. There has been come conversation on that issue and I
would like to take a shot at it. Before doing so, I want to run my plan by
everyone.

>From the description and the comments, the goal is to test other methods of
computing the  MVN pdf. The stated concern is that the SVD used is slow
despite it being numerically stable, and that speed and stability may
become problematic as the number of features grow.

In the comments, Feynman posted an R recipe for computing the pdf using a
Cholesky trick. I would like to compute the pdf by following that recipe
while using the Cholesky implementation found in Scalanlp Breeze. To test
speed I would estimate the pdf using the original method and the Cholesky
method across a range of simulated datasets with growing n and p. To test
stability I would estimate the pdf on simulated features with some
multicollinearity.

Does this sound like a good starting point? Am I thinking of this correctly?

Given that this is my first attempt at contributing to an Apache project,
might it be a good idea to do this through the Mentor Programme?

Please let me know how this sounds, and I can provide some personal details
about my experience and motivations.

Thanks,

Chris

Reply via email to