I know in some cases devectorizing can improve performance, have you looked 
at devectorize.jl?  Also i'm not sure if you're using NumericExtentions.jl, 
but it's use has helped performance of my code a lot.

Thanks for the great post, it's always interesting to find ways to increase 
performance in Julia.

On Monday, May 26, 2014 10:53:35 AM UTC-4, Christoph Ortner wrote:
>
>
> Here is the code that I would actually like to write. Is there a chance 
> that this will eventually be in the same ballpark as a comparable C code 
> (not an optimised C code) as well (say, factor 2-3)?
>
>
> function lj(r)
>
>     s = sum(r.^2)
>
>     J = s^(-6) - 2 * s^(-3)
>
>     dJ = -12. * (s^(-7) - s^(-4)) * r
>
>     return J, dJ
>
> end
>
>
> function energy(x, phi)
>
> E = 0.0; dE = zeros(size(x))
>
> for n = 1:(size(x, 2)-1)
>
>     for k = (n+1):size(x,2)
>
>         J, dJ = phi(x[:,k]-x[:,n])
>
>         E += J
>
>         dE[:, k] += dJ
>
>         dE[:, n] -= dJ
>
>     end
>
> end
>
> return E, dE
>
> end
>
>
>
>
> Many thanks for all your help and comments.
>   Christoph
>
>
>
> On Sunday, 25 May 2014 21:25:27 UTC+1, Viral Shah wrote:
>>
>> There are improvements planned, which should make it possible to write 
>> the code as you originally wrote. For now though, you will have to write it 
>> C style if you want the highest performance.
>>
>> -viral
>>
>> On Monday, May 26, 2014 12:00:50 AM UTC+5:30, Christoph Ortner wrote:
>>>
>>>
>>> Thank you both for the suggestions. I've re-written the code >> JULIA3, 
>>> with amazing results. JULIA2 was the previous "optimised" code.
>>>>
>>>>
>>> Test 1, J2: 0.751340928, J3: 0.008927998, C: 0.007420171; max-error = 0.0
>>>
>>> Test 2, J2: 0.71344446, J3: 0.009042345, C: 0.007583811; max-error = 0.0
>>>
>>> Test 3, J2: 0.719000046, J3: 0.008970343, C: 0.007626061; max-error = 0.0
>>>
>>> Test 4, J2: 0.707967183, J3: 0.008979525, C: 0.007572056; max-error = 0.0
>>>
>>> Test 5, J2: 0.730254977, J3: 0.009012892, C: 0.007649305; max-error = 0.0
>>>
>>> . . ..   (repeating the test gives consistent results; C is gcc with -O3)
>>>
>>> The new code and the test-code are copied below. Of course this means I 
>>> have to write C-style codes in Julia to get this sort of performance. Why 
>>> does Julia not optimise 
>>>
>>>         dE[:, k] += dJ
>>>
>>>         dE[:, n] -= dJ
>>>
>>> to
>>>
>>>         for i = 1:d
>>>
>>>             dE[i, k] += dJ * r[i]
>>>
>>>             dE[i, n] -= dJ * r[i]
>>>
>>>         end
>>>
>>> ?
>>>
>>>
>>> > I think you should also replace your (s*s*s*s*s) with s^5 - it'll 
>>> > automatically do the "right thing", and I'd be surprised if that is 
>>> > slower.
>>>
>>> If I revert to s^5, etc, then I lose 2 orders of magnitude.
>>>
>>>
>>> I will look into NumericExtensions and the profiler next.
>>>
>>>
>>> Thank you again for the help.
>>>
>>>    Christoph
>>>
>>>
>>>
>>> function energy_julia3(x)
>>>
>>> N = size(x,2); d = size(x,1)
>>>
>>> E = 0.0; dE = zeros(d, N)
>>>
>>> r = zeros(d);
>>>
>>> dJ = 0.; s = 0.
>>>
>>> for n = 1:(N-1)
>>>
>>>     for k = (n+1):N
>>>
>>>         s = 0.
>>>
>>>         for i = 1:d
>>>
>>>             r[i] = x[i,k]-x[i,n]
>>>
>>>             s += r[i]*r[i]
>>>
>>>         end
>>>
>>>         E += 1./(s*s*s*s*s*s) - 2. / (s*s*s)
>>>
>>>         dJ = -12. * (1./(s*s*s*s*s*s*s) - 1./(s*s*s*s))
>>>
>>>         for i = 1:d
>>>
>>>             dE[i, k] += dJ * r[i]
>>>
>>>             dE[i, n] -= dJ * r[i]
>>>
>>>         end
>>>
>>>     end
>>>
>>> end
>>>
>>> return E, dE
>>>
>>> end
>>>
>>>
>>> function meshgrid{T}(vx::AbstractVector{T}, vy::AbstractVector{T})
>>>
>>>     m, n = length(vy), length(vx)
>>>
>>>     vx = reshape(vx, 1, n)
>>>
>>>     vy = reshape(vy, m, 1)
>>>
>>>     (repmat(vx, m, 1), repmat(vy, 1, n))
>>>
>>> end
>>>
>>>
>>> function lj_test_juliaopt(N)
>>>
>>>     x = linspace(0, N, N+1)   
>>>     x, y = meshgrid(x, x)
>>>
>>>     x = [x[:] y[:]]'
>>>
>>>    for n = 1:10
>>>
>>>        tic(); Ej2, dEj2 = energy_julia2(x); t2 = toq();
>>>
>>>        tic(); Ej3, dEj3 = energy_julia3(x); t3 = toq();
>>>
>>>        tic();
>>>
>>>        dEc = zeros(size(x))
>>>
>>>        Ec = ccall( (:energy, "./libljtest_c"), Cdouble, (Ptr{Cdouble}, 
>>> Ptr{Cdouble}, Cint, Cint), x, dEc, size(x,2), size(x,1))
>>>
>>>        tc = toq();
>>>
>>>        error = max( abs(Ej2-Ej3), abs(Ej2-Ec), norm(dEj2[:]-dEj3[:], Inf), 
>>> norm(dEj2[:]-dEc[:], Inf) )
>>>
>>>        println("Test ", n, ", J2: ", t2, ", J3: ", t3, ", C: ", tc, "; 
>>> max-error = ", error)
>>>
>>>     end
>>>
>>> end
>>>
>>>

Reply via email to