It is definitely the slicing that is killing performance. Right now, 
slicing is expensive since you copy a whole new array for it.

Putting const in front of T and RHS and using a loop like this (maybe some 
mistake but the principle is what is important) makes the code 10x faster:

@inbounds for i=2:NX-1, j=2:NY-1, k=2:NZ-1
    RHS[i,j,k] = dt*A*( (T[i-1,j,k]-2*T[i,j,k]+T[i+1,j,k])/dx2  +
                        (T[i,j-1,k]-2*T[i,j,k]+T[i,j+1,k])/dy2  +
                        (T[i,j,k-1]-2*T[i,j,k]+T[i,j,k+1])/dz2 )
     
    T[i,j,k] = T[i,j,k] + RHS[i,j,k]
end




On Saturday, April 25, 2015 at 7:57:01 PM UTC+2, Johan Sigfrids wrote:
>
> I think it is all the slicing that is killing the performance. Maybe 
> something like arrayviews or the new sub stuff on 0.4 would help. 
> Alternatively devectorizing into a bunch of nested loops.
>
> On Saturday, April 25, 2015 at 8:42:09 PM UTC+3, Stefan Karpinski wrote:
>>
>> Stick const in front of T and RHS.
>>
>> On Sat, Apr 25, 2015 at 11:32 AM, Tim Holy <[email protected]> wrote:
>>
>>> Did you read through
>>> http://docs.julialang.org/en/release-0.3/manual/performance-tips/? You 
>>> should
>>> memorize :-) the sections up through the Tools section; the rest you can
>>> consult as you discover you need them.
>>>
>>> --Tim
>>>
>>> On Saturday, April 25, 2015 01:03:38 AM Ángel de Vicente wrote:
>>> > Hi,
>>> >
>>> > a complete Julia newbie here... I spent a couple of days learning the
>>> > syntax and main aspects of Julia, and since I heard many good things 
>>> about
>>> > it, I decided to try a little program to see how it compares against 
>>> the
>>> > other ones I regularly use: Fortran and Python.
>>> >
>>> > I wrote a minimal program to solve the 3D heat equation in a cube of
>>> > 100x100x100 points in the three languages and the time it takes to run 
>>> in
>>> > each one is:
>>> >
>>> > Fortran: ~7s
>>> > Python: ~33s
>>> > Julia:    ~80s
>>> >
>>> > The code runs for 1000 iterations, and I'm being nice to Julia, since 
>>> the
>>> > programs in Fortran and Python write 100 HDF5 files with the complete 
>>> 100^3
>>> > data (every 10 iterations).
>>> >
>>> > I attach the code (and you can also get it at: 
>>> http://pastebin.com/y5HnbWQ1)
>>> >
>>> > Am I doing something obviously wrong? Any suggestions on how to 
>>> improve its
>>> > speed?
>>> >
>>> > Thanks a lot,
>>> > Ángel de Vicente
>>>
>>>
>>

Reply via email to