On Thu, Jun 5, 2014 at 10:52 AM, Chris Foster <[email protected]> wrote: > I'm still trying to make it faster too - I think a lot of the > dot product calls in the current version can be expressed in terms of > calls to gemm(), which might give quite a nice speedup if I can get > the details right.
After writing a few explicit loops and reformulating the intensive part of the computation in terms of calls to BLAS.gemv!(), the current version is another 4x faster, which brings it to about 7x slower than plain chol() when using a single thread (1000x1000 matrices). Because I'm now using calls to gemv, the computation does benefit somewhat from throwing more threads at it, though not as much as I would naively expect. I'll submit a pull request to DualNumbers.jl and we can discuss how to make it work properly with colfact!(). ~Chris
