If you don't have the const you pay a cost every time you *access *the variable (you need to unbox it).
Now, with the original slicing there is not that many *accesses* to T and RHS and the cost is thus dominated by the overhead of the slicing operation. When you are using a triple nested for loop you are accessing the variable T and RHS much more frequently so this time the const in front of the T and RHS is much more important since you have to pay the unboxing "fee" every loop iteration. Hope that made sense. // Kristoffer On Sunday, April 26, 2015 at 3:33:26 AM UTC+2, Pooya wrote: > > Ah! I am also a newbie, and am trying to get a sense for efficiency > improvement in julia, and how it works, so I tried a few of the combination > of suggestions here, and this doesn't make sense to me. Putting const in > front of T and RHS in the original code does not change running time much, > but it is critical for the performance of the following nested loops. With > nested loops, without const, running time is much more than the original > code. The following are the exact run times on my computer (for 100 time > steps NT = 100): > > original code: 21 sec > > original code with const in front if T and RHS: 20 sec > nested loops w/out const in front if T and RHS: 200 sec > nested loops with const in front if T and RHS: 3 sec > > Also, @inbounds does not have much effect in any of the above cases, just > a little! Any thoughts are appreciated! > > > On Saturday, April 25, 2015 at 4:45:45 PM UTC-4, Kristoffer Carlsson wrote: >> >> It is definitely the slicing that is killing performance. Right now, >> slicing is expensive since you copy a whole new array for it. >> >> Putting const in front of T and RHS and using a loop like this (maybe >> some mistake but the principle is what is important) makes the code 10x >> faster: >> >> @inbounds for i=2:NX-1, j=2:NY-1, k=2:NZ-1 >> RHS[i,j,k] = dt*A*( (T[i-1,j,k]-2*T[i,j,k]+T[i+1,j,k])/dx2 + >> (T[i,j-1,k]-2*T[i,j,k]+T[i,j+1,k])/dy2 + >> (T[i,j,k-1]-2*T[i,j,k]+T[i,j,k+1])/dz2 ) >> >> T[i,j,k] = T[i,j,k] + RHS[i,j,k] >> end >> >> >> >> >> On Saturday, April 25, 2015 at 7:57:01 PM UTC+2, Johan Sigfrids wrote: >>> >>> I think it is all the slicing that is killing the performance. Maybe >>> something like arrayviews or the new sub stuff on 0.4 would help. >>> Alternatively devectorizing into a bunch of nested loops. >>> >>> On Saturday, April 25, 2015 at 8:42:09 PM UTC+3, Stefan Karpinski wrote: >>>> >>>> Stick const in front of T and RHS. >>>> >>>> On Sat, Apr 25, 2015 at 11:32 AM, Tim Holy <[email protected]> wrote: >>>> >>>>> Did you read through >>>>> http://docs.julialang.org/en/release-0.3/manual/performance-tips/? >>>>> You should >>>>> memorize :-) the sections up through the Tools section; the rest you >>>>> can >>>>> consult as you discover you need them. >>>>> >>>>> --Tim >>>>> >>>>> On Saturday, April 25, 2015 01:03:38 AM Ángel de Vicente wrote: >>>>> > Hi, >>>>> > >>>>> > a complete Julia newbie here... I spent a couple of days learning the >>>>> > syntax and main aspects of Julia, and since I heard many good things >>>>> about >>>>> > it, I decided to try a little program to see how it compares against >>>>> the >>>>> > other ones I regularly use: Fortran and Python. >>>>> > >>>>> > I wrote a minimal program to solve the 3D heat equation in a cube of >>>>> > 100x100x100 points in the three languages and the time it takes to >>>>> run in >>>>> > each one is: >>>>> > >>>>> > Fortran: ~7s >>>>> > Python: ~33s >>>>> > Julia: ~80s >>>>> > >>>>> > The code runs for 1000 iterations, and I'm being nice to Julia, >>>>> since the >>>>> > programs in Fortran and Python write 100 HDF5 files with the >>>>> complete 100^3 >>>>> > data (every 10 iterations). >>>>> > >>>>> > I attach the code (and you can also get it at: >>>>> http://pastebin.com/y5HnbWQ1) >>>>> > >>>>> > Am I doing something obviously wrong? Any suggestions on how to >>>>> improve its >>>>> > speed? >>>>> > >>>>> > Thanks a lot, >>>>> > Ángel de Vicente >>>>> >>>>> >>>>
