Hi,

On Saturday, April 25, 2015 at 9:45:45 PM UTC+1, Kristoffer Carlsson wrote:
>
> It is definitely the slicing that is killing performance. Right now, 
> slicing is expensive since you copy a whole new array for it.
>
> Putting const in front of T and RHS and using a loop like this (maybe some 
> mistake but the principle is what is important) makes the code 10x faster:
>
> @inbounds for i=2:NX-1, j=2:NY-1, k=2:NZ-1
>     RHS[i,j,k] = dt*A*( (T[i-1,j,k]-2*T[i,j,k]+T[i+1,j,k])/dx2  +
>                         (T[i,j-1,k]-2*T[i,j,k]+T[i,j+1,k])/dy2  +
>                         (T[i,j,k-1]-2*T[i,j,k]+T[i,j,k+1])/dz2 )
>      
>     T[i,j,k] = T[i,j,k] + RHS[i,j,k]
> end
>
>
> thanks for this. With this implementation, I get 18 seconds in my computer 
> and if I change the loops to follow the Julia ordering 
>
 @inbounds for k=2:NZ-1, j=2:NY-1, i=2:NX-1

then it goes down to 16 seconds, still more than double the Fortran 
implementation, but much more reasonable. And actually I like this syntax a 
lot, it makes very clear to read the equation. I'm used to slicing from 
Fortran, but I actually prefer this way (I've seen some similar syntax 
already in some other programming language, but cannot remember now, I 
think Haskell).

Now I have two more questions, to see if I can get better performance:

1) I'm just running the Julia distribuation that came with my Ubuntu 
distro. I don't know how this was compiled. Is there a way to see which 
optimization level and which compiler options were used when compiling 
Julia? Would I be able to get better performance out of Julia if I do my 
own compilation from source? (either using a high optimization flag or 
perhaps even using another compiler (I have access to the Intel compilers 
suite here).

2) Is it possible to give optimization flags somehow to the JIT compiler? 
In this case I know that the main_loop function is crucial, and it is going 
to be executed hundreds/thousands of times, so I wouldn't mind spending 
more time the first time it is compiled if it can be optimized as much as 
possible.

Thanks,
Ángel 
 
 

Reply via email to