Hi, On Saturday, April 25, 2015 at 9:45:45 PM UTC+1, Kristoffer Carlsson wrote: > > It is definitely the slicing that is killing performance. Right now, > slicing is expensive since you copy a whole new array for it. > > Putting const in front of T and RHS and using a loop like this (maybe some > mistake but the principle is what is important) makes the code 10x faster: > > @inbounds for i=2:NX-1, j=2:NY-1, k=2:NZ-1 > RHS[i,j,k] = dt*A*( (T[i-1,j,k]-2*T[i,j,k]+T[i+1,j,k])/dx2 + > (T[i,j-1,k]-2*T[i,j,k]+T[i,j+1,k])/dy2 + > (T[i,j,k-1]-2*T[i,j,k]+T[i,j,k+1])/dz2 ) > > T[i,j,k] = T[i,j,k] + RHS[i,j,k] > end > > > thanks for this. With this implementation, I get 18 seconds in my computer > and if I change the loops to follow the Julia ordering > @inbounds for k=2:NZ-1, j=2:NY-1, i=2:NX-1
then it goes down to 16 seconds, still more than double the Fortran implementation, but much more reasonable. And actually I like this syntax a lot, it makes very clear to read the equation. I'm used to slicing from Fortran, but I actually prefer this way (I've seen some similar syntax already in some other programming language, but cannot remember now, I think Haskell). Now I have two more questions, to see if I can get better performance: 1) I'm just running the Julia distribuation that came with my Ubuntu distro. I don't know how this was compiled. Is there a way to see which optimization level and which compiler options were used when compiling Julia? Would I be able to get better performance out of Julia if I do my own compilation from source? (either using a high optimization flag or perhaps even using another compiler (I have access to the Intel compilers suite here). 2) Is it possible to give optimization flags somehow to the JIT compiler? In this case I know that the main_loop function is crucial, and it is going to be executed hundreds/thousands of times, so I wouldn't mind spending more time the first time it is compiled if it can be optimized as much as possible. Thanks, Ángel
