Ah right, that seems to be related to the problem.  It's a little better if

    L = chol(A, :U)
    L = full(L)

is replaced with

    L = full(chol(A, :U))

but the big improvement comes from putting in a type annotation there:

    L = full(chol(A, :U)) :: typeof(A)

I'm not sure if that's the right way to handle it, but it improves the
speed by a factor of two, and reduces the memory allocation to
something reasonable.  (It also makes the devectorized version *much*
faster than before, but not as fast as the version using BLAS which
isn't very surprising.)

Any idea why type inference is failing on full(chol(A, :U)) ?

~Chris

On Wed, Jun 4, 2014 at 4:04 AM, Kevin Squire <[email protected]> wrote:
> One issue might be that you change the type of L, which I believe boxes it
> (but someone closer to the compiler will have to verify).
>
> Maybe try using a different variable for the result of the decomposition?
>
> Cheers, Kevin
>
> On Tuesday, June 3, 2014, Chris Foster <[email protected]> wrote:
>>
>> On Wed, Jun 4, 2014 at 2:12 AM, Chris Foster <[email protected]> wrote:
>> > fiddling with Base.BLAS.dot only got me as far as a segfault so far.
>>
>> Actually I think I've fixed that now in the gist and using BLAS.dot
>> directly is faster, though still not very impressive.  According to
>> @time, I've still got some mystery allocations somewhere, but I can't
>> see where.  Ideas anyone?

Reply via email to