Without knowing much more about your specific problem and use cases, you
may also want to consider:

- never computing the tensor directly, but provide an indirect
representation of it by providing a function that  computes its action on a
vector
- using a fast multipole algorithm to determine the distance cutoff
- reducing the number of explicit distance computations using neighbor lists

Thanks,

Jiahao Chen
Staff Research Scientist
MIT Computer Science and Artificial Intelligence Laboratory

On Fri, Oct 24, 2014 at 12:38 PM, Jiahao Chen <[email protected]> wrote:

> Are you sure that @simd and @inbounds give you any performance gains? On
> my laptop removing all those annotations resulted in essentially no change
> in performance.
>
> Computing the outer product as a*b' is currently slow because it does not
> use the specialized method for computing outer products. Instead, b' turns
> the n-vector into a 1xn-row matrix, then a*b' turns a from a n-vector to a
> nx1-column matrix, then does generic matrix-matrix multiplication (dgemm).
> You can verify this yourself by running
>
> @which a*b'
>
> and looking at the method of * that is being dispatched on.
>
> As Andreas said, you can try the rank-1 update; one way to use BLAS's
> function for this is
>
> outer_product(a, b, n) = Base.LinAlg.BLAS.ger!(1.0, a, b, zeros(n, n))
>
> but for 3x3 outer products writing your own kernel may very well be faster
> because you avoid the cost of making a library call.
>
> For me the biggest speedup came from rewriting your inner kernel as
>
>             rpy[3*j-2:3*j, 3*i-2:3*i] = if (Rij > 2*r)
>                 (0.75 * distance_ratio *
>                     ((1.0 + 2.0/3.0 * distance_ratio*distance_ratio) * I +
>                     (1.0 - 2.0*distance_ratio*distance_ratio) *
> outer_product(rvec, rvec, 3)))
>             else
>                     (1.0 - 9.0/32.0 / distance_ratio) * I +
>                     3.0/32.0 / distance_ratio * outer_product(rvec, rvec,
> 3)
>             end
>
> - The I symbol is an abstract identity matrix that avoids allocating an
> explicit matrix.
> - The compiler seems happier when I move the assignment outside the if
> statement.
>
> Thanks,
>
> Jiahao Chen
> Staff Research Scientist
> MIT Computer Science and Artificial Intelligence Laboratory
>
> On Fri, Oct 24, 2014 at 11:24 AM, <[email protected]> wrote:
>
>> I'm absolutely sure that Rij > 2*r most of the time, yes. This is obvious
>> if you consider that rand(n) returns values in the range [0, 1] and r =
>> 0.01. I have also verified this empirically. I'll try to explore the view
>> update idea further, I've never used that approach before.
>>
>>
>> On Friday, October 24, 2014 5:46:37 PM UTC+3, Andreas Noack wrote:
>>>
>>> Are you sure that Rij > 2*r in your calculations?
>>>
>>> You should use I instead of eye(3), but probably the best solution is to
>>> avoid the explicit calculation of the outer product. You can probably write
>>> your problem as a rank one update of a view of pry
>>>
>>> Med venlig hilsen
>>>
>>> Andreas Noack
>>>
>>
>

Reply via email to