Are you sure that Rij > 2*r in your calculations?

You should use I instead of eye(3), but probably the best solution is to
avoid the explicit calculation of the outer product. You can probably write
your problem as a rank one update of a view of pry

Med venlig hilsen

Andreas Noack

2014-10-24 10:26 GMT-04:00 <[email protected]>:

> So, I've been gradually porting some of my more performance sensitive
> MATLAB research code to Julia, with great results so far. One of my current
> projects spends about 80% of the computation time calculating the
> Rotne-Prager-Yamakawa tensor at each time step, so I'm quite motivated to
> optimize that particular step of the computation as much as possible.
> Rewriting the code in Julia alone resulted in an order of magnitude
> speedup. Writing my own implementation of the dyadic product of two vectors
> gave me another factor of 2 (which I found rather surprising, but still),
> as did heavy use of @simd and @inbounds annotations where appropriate.
> Despite these successes, I still feel like there is more performance to be
> gained here. For one, the code still spends a lot of the time on garbage
> collection. Also, running julia with --track-allocation=user gave me some
> perplexing results.
>
> First things first, the relevant code:
> function outer_product(a, b, n)
>     c = zeros(n, n)
>     @simd for i = 1 : n
>         @simd for j = 1 : n
>             @inbounds c[i, j] = a[i] * b[j]
>         end
>     end
>     c
> end
>
> function rotne_prager(x, y, z, r)
>     const p = length(x) - 1
>     rpy = zeros(3*(p+1), 3*(p+1))
>     rvec = zeros(3)
>     @simd for i = 1 : p+1
>         @simd for j = 1 : p+1
>             @inbounds rvec[1] = x[j] - x[i]
>             @inbounds rvec[2] = y[j] - y[i]
>             @inbounds rvec[3] = z[j] - z[i]
>             Rij = sqrt(rvec[1]*rvec[1] + rvec[2]*rvec[2] + rvec[3]*rvec[3
> ])
>             distance_ratio = r / Rij
>             if (Rij > 2*r)
>                 @inbounds rpy[3*j-2:3*j, 3*i-2:3*i] = (0.75 *
> distance_ratio *
>                     ((1.0 + 2.0/3.0 * distance_ratio*distance_ratio) * eye
> (3) +
>                     (1.0 - 2.0*distance_ratio*distance_ratio) *
> outer_product(rvec, rvec, 3)))
>             else
>                 @inbounds rpy[3*j-2:3*j, 3*i-2:3*i] = (
>                     (1.0 - 9.0/32.0 / distance_ratio) * eye(3) +
>                     3.0/32.0 / distance_ratio * outer_product(rvec, rvec,
> 3))
>             end
>         end
>     end
>     rpy
> end
>
> function rotne_bench(n)
>     x = rand(n)
>     y = rand(n)
>     z = rand(n)
>     r = 0.01
>     rotne_prager(x, y, z, r);
> end
>
> rotne_bench(1000);
> @time (for i=1:10; rotne_bench(1000); end)
>
> Running this yields the following output on one of my machines:
> elapsed time: 13.440131479 seconds (8718927056 bytes allocated, 36.99% gc
> time)
>
> I then tried to add proper type annotations everywhere, but this did not
> change anything in the results, so I'm assuming Julia's type inference
> system is doing a great job here. Next, I tried to look at the allocations,
> using the --track-allocation=user switch (by the way, this did not seem to
> produce a *jl.mem file anywhere on my Windows machines, but worked fine on
> a Linux box, is this a known behavior?). This yielded the following output:
>         - function outer_product(a, b, n)
> -70967296     c = zeros(n, n)
>         0     @simd for i = 1 : n
>         0         @simd for j = 1 : n
>         0             @inbounds c[i, j] = a[i] * b[j]
>         -         end
>         -     end
>         0     c
>         - end
>         -
>         - function rotne_prager(x, y, z, r)
>         0     const p = length(x) - 1
> 792000528     rpy = zeros(3*(p+1), 3*(p+1))
>       880     rvec = zeros(3)
>         0     @simd for i = 1 : p+1
>         0         @simd for j = 1 : p+1
>         0             @inbounds rvec[1] = x[j] - x[i]
>         0             @inbounds rvec[2] = y[j] - y[i]
>         0             @inbounds rvec[3] = z[j] - z[i]
>         0             Rij = sqrt(rvec[1]*rvec[1] + rvec[2]*rvec[2] + rvec[
> 3]*rvec[3])
>         0             distance_ratio = r / Rij
>         0             if (Rij > 2*r)
>         0                 @inbounds rpy[3*j-2:3*j, 3*i-2:3*i] = (0.75 *
> distance_ratio *
>         -                     ((1.0 + 2.0/3.0 * distance_ratio*
> distance_ratio) * eye(3) +
>         -                     (1.0 - 2.0*distance_ratio*distance_ratio) *
> outer_product(rvec, rvec, 3)))
>         -             else
> 279582208                 @inbounds rpy[3*j-2:3*j, 3*i-2:3*i] = (
>         -                     (1.0 - 9.0/32.0 / distance_ratio) * eye(3) +
>         -                     3.0/32.0 / distance_ratio * outer_product(
> rvec, rvec, 3))
>         -             end
>         -         end
>         -     end
>         0     rpy
>         0 end
>         -
>         0 function rotne_bench(n)
>         0     x = rand(n)
>         0     y = rand(n)
>         0     z = rand(n)
>         0     r = 0.01
>         0     rotne_prager(x, y, z, r);
>         - end
>         -
>         - rotne_bench(1000);
>         0 @time (for i=1:10; rotne_bench(1000); end)
>
> I find myself somewhat confused by these results, specifically line 27.
> Why would the second expression constructing the rpy matrix cause a huge
> number of allocations when the first did not? They seem virtually identical
> from a programming point of view, with minor differences in numerical
> coefficients. I feel like, in my limited understanding of a language I'm
> still learning, I'm missing or misunderstanding something fairly obvious.
> Any comments, suggestions and criticisms are welcome, both with the regards
> to the performance issues as well as my coding style etc.
>
>

Reply via email to