On Thu, Jun 5, 2014 at 10:52 AM, Chris Foster <[email protected]> wrote:
> I'm still trying to make it faster too - I think a lot of the
> dot product calls in the current version can be expressed in terms of
> calls to gemm(), which might give quite a nice speedup if I can get
> the details right.

After writing a few explicit loops and reformulating the intensive
part of the computation in terms of calls to BLAS.gemv!(), the current
version is another 4x faster, which brings it to about 7x slower than
plain chol() when using a single thread (1000x1000 matrices).  Because
I'm now using calls to gemv, the computation does benefit somewhat
from throwing more threads at it, though not as much as I would
naively expect.

I'll submit a pull request to DualNumbers.jl and we can discuss how to
make it work properly with colfact!().

~Chris

Reply via email to