I took a look at the Lennard-Jones example. On top of the memory allocation
from array slicing, there is also a lot of overhead in passing lj(r) as an
input function phi. There is also some overhead from computing size(x) in
the range for the inner loop.
Consider this version of the code, which is "C style" in that the loops are
explicitly unrolled, but otherwise is not that far from the original
lj_pretty:
# definition of the Lennard-Jones potential
function LJ(r)
r⁻² = sumabs2(r)
r⁻⁶ = 1.0/(r⁻²*r⁻²*r⁻²)
J = r⁻⁶*r⁻⁶ - 2.0r⁻⁶
∇ᵣJ = -12. * (r⁻⁶*r⁻⁶ - r⁻⁶) / r⁻²
J, ∇ᵣJ #<--
end
function lj_cstyle2(x)
d, N = size(x)
E = 0.0
∇E = zeros(x) #note that zeros supports this form too
r = zeros(d)
@inbounds for n = 1:(N-1), k = (n+1):N
for i = 1:d
r[i] = x[i,k]-x[i,n]
end
J, ∇ᵣJ = LJ(r) #<--
E += J
for i = 1:d
∇E[i, k] += ∇ᵣJ * r[i]
∇E[i, n] -= ∇ᵣJ * r[i]
end
end
E, ∇E #return keyword not strictly required
end
For me, lj_cstyle2 runs in 83 ms, compared to 66 ms for lj_cstyle.
There is some overhead from constructing and unpacking the tuples in the
two lines marked with #<--, but I think it is unavoidable if you want to
have a single external function that computes both the potential and its
gradient.
Note that this version of LJ(r) computes two numbers, not a number and a
vector. For any central potential the gradient along r, ∇ᵣJ, is sufficient
to characterize the entire force ∇E, so there is not much loss of
generality in not returning the force in the original coordinates. In this
case, the mathematical structure of the problem is not entirely at odds
with its implementation in code.
Thanks,
Jiahao Chen
Staff Research Scientist
MIT Computer Science and Artificial Intelligence Laboratory
On Mon, Dec 1, 2014 at 12:14 PM, Christoph Ortner <
[email protected]> wrote:
> Hi Steven,
> Thanks for the feedback - yes it was intentional, but I am definitely
> planning to extend the notebook with performance enhancements.
> I was not yet aware of this distinction between Int and Integer. Thanks
> for pointing it out.
> Christoph
>
>
> On Monday, 1 December 2014 15:49:10 UTC, Steven G. Johnson wrote:
>>
>> I notice that in your finite-element notebook you use:
>>
>> T = zeros(Integer, 3, 2*N^2)
>>
>>
>> You really want to use Int in cases like this (Int = Int64 on 64-bit
>> machines and Int32 on 32-bit machines). Using Integer means that you have
>> an array of pointers to generic integer containers, whose type must be
>> checked at runtime (e.g. T[i] could be an Int32, Int64, BigInt, etc.).
>>
>> (You also have very Matlab-like code that allocates zillions of little
>> temporary arrays in your inner loops. I assume this is intentional, but it
>> is definitely suboptimal performance, possibly by an order of magnitude.)
>>
>