Are you sure that @simd and @inbounds give you any performance gains? On my
laptop removing all those annotations resulted in essentially no change in
performance.
Computing the outer product as a*b' is currently slow because it does not
use the specialized method for computing outer products. Instead, b' turns
the n-vector into a 1xn-row matrix, then a*b' turns a from a n-vector to a
nx1-column matrix, then does generic matrix-matrix multiplication (dgemm).
You can verify this yourself by running
@which a*b'
and looking at the method of * that is being dispatched on.
As Andreas said, you can try the rank-1 update; one way to use BLAS's
function for this is
outer_product(a, b, n) = Base.LinAlg.BLAS.ger!(1.0, a, b, zeros(n, n))
but for 3x3 outer products writing your own kernel may very well be faster
because you avoid the cost of making a library call.
For me the biggest speedup came from rewriting your inner kernel as
rpy[3*j-2:3*j, 3*i-2:3*i] = if (Rij > 2*r)
(0.75 * distance_ratio *
((1.0 + 2.0/3.0 * distance_ratio*distance_ratio) * I +
(1.0 - 2.0*distance_ratio*distance_ratio) *
outer_product(rvec, rvec, 3)))
else
(1.0 - 9.0/32.0 / distance_ratio) * I +
3.0/32.0 / distance_ratio * outer_product(rvec, rvec, 3)
end
- The I symbol is an abstract identity matrix that avoids allocating an
explicit matrix.
- The compiler seems happier when I move the assignment outside the if
statement.
Thanks,
Jiahao Chen
Staff Research Scientist
MIT Computer Science and Artificial Intelligence Laboratory
On Fri, Oct 24, 2014 at 11:24 AM, <[email protected]> wrote:
> I'm absolutely sure that Rij > 2*r most of the time, yes. This is obvious
> if you consider that rand(n) returns values in the range [0, 1] and r =
> 0.01. I have also verified this empirically. I'll try to explore the view
> update idea further, I've never used that approach before.
>
>
> On Friday, October 24, 2014 5:46:37 PM UTC+3, Andreas Noack wrote:
>>
>> Are you sure that Rij > 2*r in your calculations?
>>
>> You should use I instead of eye(3), but probably the best solution is to
>> avoid the explicit calculation of the outer product. You can probably write
>> your problem as a rank one update of a view of pry
>>
>> Med venlig hilsen
>>
>> Andreas Noack
>>
>