* r⁻² should be r², of course.
Note that the following very naive version of LJ(r)
function LJ(r)
r² = sumabs2(r)
J = 1.0/(r²)^6 - 2.0/(r²)^3
∇ᵣJ = (-12.0/(r²)^6 + 12.0/(r²)^3) / r²
return J, ∇ᵣJ
end
results in code that runs in 0.42s, which is not terrible (and such
performance would be expected for more complicated Lennard-Jones potentials
which are not 12-6).
Thanks,
Jiahao Chen
Staff Research Scientist
MIT Computer Science and Artificial Intelligence Laboratory
On Mon, Dec 1, 2014 at 9:40 PM, Jiahao Chen <[email protected]> wrote:
> I took a look at the Lennard-Jones example. On top of the memory
> allocation from array slicing, there is also a lot of overhead in passing
> lj(r) as an input function phi. There is also some overhead from computing
> size(x) in the range for the inner loop.
>
> Consider this version of the code, which is "C style" in that the loops
> are explicitly unrolled, but otherwise is not that far from the original
> lj_pretty:
>
> # definition of the Lennard-Jones potential
> function LJ(r)
> r⁻² = sumabs2(r)
> r⁻⁶ = 1.0/(r⁻²*r⁻²*r⁻²)
> J = r⁻⁶*r⁻⁶ - 2.0r⁻⁶
> ∇ᵣJ = -12. * (r⁻⁶*r⁻⁶ - r⁻⁶) / r⁻²
> J, ∇ᵣJ #<--
> end
>
> function lj_cstyle2(x)
> d, N = size(x)
> E = 0.0
> ∇E = zeros(x) #note that zeros supports this form too
> r = zeros(d)
> @inbounds for n = 1:(N-1), k = (n+1):N
> for i = 1:d
> r[i] = x[i,k]-x[i,n]
> end
> J, ∇ᵣJ = LJ(r) #<--
> E += J
> for i = 1:d
> ∇E[i, k] += ∇ᵣJ * r[i]
> ∇E[i, n] -= ∇ᵣJ * r[i]
> end
> end
> E, ∇E #return keyword not strictly required
> end
>
> For me, lj_cstyle2 runs in 83 ms, compared to 66 ms for lj_cstyle.
>
> There is some overhead from constructing and unpacking the tuples in the
> two lines marked with #<--, but I think it is unavoidable if you want to
> have a single external function that computes both the potential and its
> gradient.
>
> Note that this version of LJ(r) computes two numbers, not a number and a
> vector. For any central potential the gradient along r, ∇ᵣJ, is sufficient
> to characterize the entire force ∇E, so there is not much loss of
> generality in not returning the force in the original coordinates. In this
> case, the mathematical structure of the problem is not entirely at odds
> with its implementation in code.
>
> Thanks,
>
> Jiahao Chen
> Staff Research Scientist
> MIT Computer Science and Artificial Intelligence Laboratory
>
> On Mon, Dec 1, 2014 at 12:14 PM, Christoph Ortner <
> [email protected]> wrote:
>
>> Hi Steven,
>> Thanks for the feedback - yes it was intentional, but I am definitely
>> planning to extend the notebook with performance enhancements.
>> I was not yet aware of this distinction between Int and Integer. Thanks
>> for pointing it out.
>> Christoph
>>
>>
>> On Monday, 1 December 2014 15:49:10 UTC, Steven G. Johnson wrote:
>>>
>>> I notice that in your finite-element notebook you use:
>>>
>>> T = zeros(Integer, 3, 2*N^2)
>>>
>>>
>>> You really want to use Int in cases like this (Int = Int64 on 64-bit
>>> machines and Int32 on 32-bit machines). Using Integer means that you have
>>> an array of pointers to generic integer containers, whose type must be
>>> checked at runtime (e.g. T[i] could be an Int32, Int64, BigInt, etc.).
>>>
>>> (You also have very Matlab-like code that allocates zillions of little
>>> temporary arrays in your inner loops. I assume this is intentional, but it
>>> is definitely suboptimal performance, possibly by an order of magnitude.)
>>>
>>
>