I'm currently avoiding the use of FMMs because the systems I run are fairly small (colloidal soft matter, typically involving a few hundreds of spheres in a small number of aggregates, so each time step is quite fast, but the dynamics take a long time to resolve) and I'd rather not sacrifice any precision if I don't have to. Neighbor lists might be an interesting option to consider, however.
I'm not entirely sure what would be gained by providing a function that computes the tensor's action on a vector instead of computing it directly. In a Stokes flow, the velocity of an arbitrary assortment of spheres can be approximately expressed as v = rpy * force_density, which is basically what I'm doing in this case. Assuming the force_density vector isn't sparse, wouldn't you have to compute every element of the tensor anyway? On Friday, October 24, 2014 7:43:56 PM UTC+3, Jiahao Chen wrote: > > Without knowing much more about your specific problem and use cases, you > may also want to consider: > > - never computing the tensor directly, but provide an indirect > representation of it by providing a function that computes its action on a > vector > - using a fast multipole algorithm to determine the distance cutoff > - reducing the number of explicit distance computations using neighbor > lists > > Thanks, > > Jiahao Chen > Staff Research Scientist > MIT Computer Science and Artificial Intelligence Laboratory > > On Fri, Oct 24, 2014 at 12:38 PM, Jiahao Chen <[email protected] > <javascript:>> wrote: > >> Are you sure that @simd and @inbounds give you any performance gains? On >> my laptop removing all those annotations resulted in essentially no change >> in performance. >> >> Computing the outer product as a*b' is currently slow because it does not >> use the specialized method for computing outer products. Instead, b' turns >> the n-vector into a 1xn-row matrix, then a*b' turns a from a n-vector to a >> nx1-column matrix, then does generic matrix-matrix multiplication (dgemm). >> You can verify this yourself by running >> >> @which a*b' >> >> and looking at the method of * that is being dispatched on. >> >> As Andreas said, you can try the rank-1 update; one way to use BLAS's >> function for this is >> >> outer_product(a, b, n) = Base.LinAlg.BLAS.ger!(1.0, a, b, zeros(n, n)) >> >> but for 3x3 outer products writing your own kernel may very well be >> faster because you avoid the cost of making a library call. >> >> For me the biggest speedup came from rewriting your inner kernel as >> >> rpy[3*j-2:3*j, 3*i-2:3*i] = if (Rij > 2*r) >> (0.75 * distance_ratio * >> ((1.0 + 2.0/3.0 * distance_ratio*distance_ratio) * I + >> (1.0 - 2.0*distance_ratio*distance_ratio) * >> outer_product(rvec, rvec, 3))) >> else >> (1.0 - 9.0/32.0 / distance_ratio) * I + >> 3.0/32.0 / distance_ratio * outer_product(rvec, rvec, >> 3) >> end >> >> - The I symbol is an abstract identity matrix that avoids allocating an >> explicit matrix. >> - The compiler seems happier when I move the assignment outside the if >> statement. >> >> Thanks, >> >> Jiahao Chen >> Staff Research Scientist >> MIT Computer Science and Artificial Intelligence Laboratory >> >> On Fri, Oct 24, 2014 at 11:24 AM, <[email protected] <javascript:>> >> wrote: >> >>> I'm absolutely sure that Rij > 2*r most of the time, yes. This is >>> obvious if you consider that rand(n) returns values in the range [0, 1] and >>> r = 0.01. I have also verified this empirically. I'll try to explore the >>> view update idea further, I've never used that approach before. >>> >>> >>> On Friday, October 24, 2014 5:46:37 PM UTC+3, Andreas Noack wrote: >>>> >>>> Are you sure that Rij > 2*r in your calculations? >>>> >>>> You should use I instead of eye(3), but probably the best solution is >>>> to avoid the explicit calculation of the outer product. You can probably >>>> write your problem as a rank one update of a view of pry >>>> >>>> Med venlig hilsen >>>> >>>> Andreas Noack >>>> >>> >> >
