Without knowing much more about your specific problem and use cases, you may also want to consider:
- never computing the tensor directly, but provide an indirect representation of it by providing a function that computes its action on a vector - using a fast multipole algorithm to determine the distance cutoff - reducing the number of explicit distance computations using neighbor lists Thanks, Jiahao Chen Staff Research Scientist MIT Computer Science and Artificial Intelligence Laboratory On Fri, Oct 24, 2014 at 12:38 PM, Jiahao Chen <[email protected]> wrote: > Are you sure that @simd and @inbounds give you any performance gains? On > my laptop removing all those annotations resulted in essentially no change > in performance. > > Computing the outer product as a*b' is currently slow because it does not > use the specialized method for computing outer products. Instead, b' turns > the n-vector into a 1xn-row matrix, then a*b' turns a from a n-vector to a > nx1-column matrix, then does generic matrix-matrix multiplication (dgemm). > You can verify this yourself by running > > @which a*b' > > and looking at the method of * that is being dispatched on. > > As Andreas said, you can try the rank-1 update; one way to use BLAS's > function for this is > > outer_product(a, b, n) = Base.LinAlg.BLAS.ger!(1.0, a, b, zeros(n, n)) > > but for 3x3 outer products writing your own kernel may very well be faster > because you avoid the cost of making a library call. > > For me the biggest speedup came from rewriting your inner kernel as > > rpy[3*j-2:3*j, 3*i-2:3*i] = if (Rij > 2*r) > (0.75 * distance_ratio * > ((1.0 + 2.0/3.0 * distance_ratio*distance_ratio) * I + > (1.0 - 2.0*distance_ratio*distance_ratio) * > outer_product(rvec, rvec, 3))) > else > (1.0 - 9.0/32.0 / distance_ratio) * I + > 3.0/32.0 / distance_ratio * outer_product(rvec, rvec, > 3) > end > > - The I symbol is an abstract identity matrix that avoids allocating an > explicit matrix. > - The compiler seems happier when I move the assignment outside the if > statement. > > Thanks, > > Jiahao Chen > Staff Research Scientist > MIT Computer Science and Artificial Intelligence Laboratory > > On Fri, Oct 24, 2014 at 11:24 AM, <[email protected]> wrote: > >> I'm absolutely sure that Rij > 2*r most of the time, yes. This is obvious >> if you consider that rand(n) returns values in the range [0, 1] and r = >> 0.01. I have also verified this empirically. I'll try to explore the view >> update idea further, I've never used that approach before. >> >> >> On Friday, October 24, 2014 5:46:37 PM UTC+3, Andreas Noack wrote: >>> >>> Are you sure that Rij > 2*r in your calculations? >>> >>> You should use I instead of eye(3), but probably the best solution is to >>> avoid the explicit calculation of the outer product. You can probably write >>> your problem as a rank one update of a view of pry >>> >>> Med venlig hilsen >>> >>> Andreas Noack >>> >> >
