Ah right, that seems to be related to the problem. It's a little better if
L = chol(A, :U)
L = full(L)
is replaced with
L = full(chol(A, :U))
but the big improvement comes from putting in a type annotation there:
L = full(chol(A, :U)) :: typeof(A)
I'm not sure if that's the right way to handle it, but it improves the
speed by a factor of two, and reduces the memory allocation to
something reasonable. (It also makes the devectorized version *much*
faster than before, but not as fast as the version using BLAS which
isn't very surprising.)
Any idea why type inference is failing on full(chol(A, :U)) ?
~Chris
On Wed, Jun 4, 2014 at 4:04 AM, Kevin Squire <[email protected]> wrote:
> One issue might be that you change the type of L, which I believe boxes it
> (but someone closer to the compiler will have to verify).
>
> Maybe try using a different variable for the result of the decomposition?
>
> Cheers, Kevin
>
> On Tuesday, June 3, 2014, Chris Foster <[email protected]> wrote:
>>
>> On Wed, Jun 4, 2014 at 2:12 AM, Chris Foster <[email protected]> wrote:
>> > fiddling with Base.BLAS.dot only got me as far as a segfault so far.
>>
>> Actually I think I've fixed that now in the gist and using BLAS.dot
>> directly is faster, though still not very impressive. According to
>> @time, I've still got some mystery allocations somewhere, but I can't
>> see where. Ideas anyone?