Re: [julia-users] Trying to optimize numerical code - peculiar allocation data

dextorious Fri, 24 Oct 2014 17:12:32 -0700

I'm currently avoiding the use of FMMs because the systems I run are fairly 
small (colloidal soft matter, typically involving a few hundreds of spheres 
in a small number of aggregates, so each time step is quite fast, but the 
dynamics take a long time to resolve) and I'd rather not sacrifice any 
precision if I don't have to. Neighbor lists might be an interesting option 
to consider, however.


I'm not entirely sure what would be gained by providing a function that 
computes the tensor's action on a vector instead of computing it directly. 
In a Stokes flow, the velocity of an arbitrary assortment of spheres can be 
approximately expressed as v = rpy * force_density, which is basically what 
I'm doing in this case. Assuming the force_density vector isn't sparse, 
wouldn't you have to compute every element of the tensor anyway?

On Friday, October 24, 2014 7:43:56 PM UTC+3, Jiahao Chen wrote:
>
> Without knowing much more about your specific problem and use cases, you 
> may also want to consider:
>
> - never computing the tensor directly, but provide an indirect 
> representation of it by providing a function that  computes its action on a 
> vector
> - using a fast multipole algorithm to determine the distance cutoff
> - reducing the number of explicit distance computations using neighbor 
> lists
>
> Thanks,
>
> Jiahao Chen
> Staff Research Scientist
> MIT Computer Science and Artificial Intelligence Laboratory
>
> On Fri, Oct 24, 2014 at 12:38 PM, Jiahao Chen <[email protected] 
> <javascript:>> wrote:
>
>> Are you sure that @simd and @inbounds give you any performance gains? On 
>> my laptop removing all those annotations resulted in essentially no change 
>> in performance.
>>
>> Computing the outer product as a*b' is currently slow because it does not 
>> use the specialized method for computing outer products. Instead, b' turns 
>> the n-vector into a 1xn-row matrix, then a*b' turns a from a n-vector to a 
>> nx1-column matrix, then does generic matrix-matrix multiplication (dgemm). 
>> You can verify this yourself by running
>>
>> @which a*b'
>>
>> and looking at the method of * that is being dispatched on.
>>
>> As Andreas said, you can try the rank-1 update; one way to use BLAS's 
>> function for this is
>>
>> outer_product(a, b, n) = Base.LinAlg.BLAS.ger!(1.0, a, b, zeros(n, n))
>>
>> but for 3x3 outer products writing your own kernel may very well be 
>> faster because you avoid the cost of making a library call.
>>
>> For me the biggest speedup came from rewriting your inner kernel as
>>
>>             rpy[3*j-2:3*j, 3*i-2:3*i] = if (Rij > 2*r)
>>                 (0.75 * distance_ratio *
>>                     ((1.0 + 2.0/3.0 * distance_ratio*distance_ratio) * I +
>>                     (1.0 - 2.0*distance_ratio*distance_ratio) * 
>> outer_product(rvec, rvec, 3)))
>>             else
>>                     (1.0 - 9.0/32.0 / distance_ratio) * I +
>>                     3.0/32.0 / distance_ratio * outer_product(rvec, rvec, 
>> 3)
>>             end
>>
>> - The I symbol is an abstract identity matrix that avoids allocating an 
>> explicit matrix.
>> - The compiler seems happier when I move the assignment outside the if 
>> statement.
>>
>> Thanks,
>>
>> Jiahao Chen
>> Staff Research Scientist
>> MIT Computer Science and Artificial Intelligence Laboratory
>>
>> On Fri, Oct 24, 2014 at 11:24 AM, <[email protected] <javascript:>> 
>> wrote:
>>
>>> I'm absolutely sure that Rij > 2*r most of the time, yes. This is 
>>> obvious if you consider that rand(n) returns values in the range [0, 1] and 
>>> r = 0.01. I have also verified this empirically. I'll try to explore the 
>>> view update idea further, I've never used that approach before.
>>>
>>>
>>> On Friday, October 24, 2014 5:46:37 PM UTC+3, Andreas Noack wrote:
>>>>
>>>> Are you sure that Rij > 2*r in your calculations? 
>>>>
>>>> You should use I instead of eye(3), but probably the best solution is 
>>>> to avoid the explicit calculation of the outer product. You can probably 
>>>> write your problem as a rank one update of a view of pry
>>>>
>>>> Med venlig hilsen
>>>>
>>>> Andreas Noack
>>>>
>>>
>>
>

Re: [julia-users] Trying to optimize numerical code - peculiar allocation data

Reply via email to