This makes sense. Thank you! It seems changing the order of nested loops 
also helps because of the column-major ordering of arrays in julia (note 
that const T = zeros(Float64,NX,NY,NZ) ):

for k=2:NZ-1, j=2:NY-1, i=2:NX-1   :   2 sec
for i=2:NX-1, j=2:NY-1, k=2:NZ-1   :   3 sec


On Sunday, April 26, 2015 at 6:28:38 AM UTC-4, Kristoffer Carlsson wrote:
>
> If you don't have the const you pay a cost every time you *access *the 
> variable (you need to unbox it).
>
> Now, with the original slicing there is not that many *accesses* to T and 
> RHS and the cost is thus dominated by the overhead of the slicing operation.
>
> When you are using a triple nested for loop you are accessing the variable 
> T and RHS  much more frequently so this time the const in front of the T 
> and RHS is much more important
> since you have to pay the unboxing "fee" every loop iteration.
>
> Hope that made sense.
>
> // Kristoffer
>
> On Sunday, April 26, 2015 at 3:33:26 AM UTC+2, Pooya wrote:
>>
>> Ah! I am also a newbie, and am trying to get a sense for efficiency 
>> improvement in julia, and how it works, so I tried a few of the combination 
>> of suggestions here, and this doesn't make sense to me. Putting const in 
>> front of T and RHS in the original code does not change running time much, 
>> but it is critical for the performance of the following nested loops. With 
>> nested loops, without const, running time is much more than the original 
>> code. The following are the exact run times on my computer (for 100 time 
>> steps NT = 100):
>>
>> original code:      21 sec                                               
>>            
>> original code with const in front if T and RHS:     20 sec       
>> nested loops w/out const in front if T and RHS:   200 sec      
>> nested loops with const in front if T and RHS:     3 sec
>>
>> Also, @inbounds does not have much effect in any of the above cases, just 
>> a little!  Any thoughts are appreciated!  
>>
>>
>> On Saturday, April 25, 2015 at 4:45:45 PM UTC-4, Kristoffer Carlsson 
>> wrote:
>>>
>>> It is definitely the slicing that is killing performance. Right now, 
>>> slicing is expensive since you copy a whole new array for it.
>>>
>>> Putting const in front of T and RHS and using a loop like this (maybe 
>>> some mistake but the principle is what is important) makes the code 10x 
>>> faster:
>>>
>>> @inbounds for i=2:NX-1, j=2:NY-1, k=2:NZ-1
>>>     RHS[i,j,k] = dt*A*( (T[i-1,j,k]-2*T[i,j,k]+T[i+1,j,k])/dx2  +
>>>                         (T[i,j-1,k]-2*T[i,j,k]+T[i,j+1,k])/dy2  +
>>>                         (T[i,j,k-1]-2*T[i,j,k]+T[i,j,k+1])/dz2 )
>>>      
>>>     T[i,j,k] = T[i,j,k] + RHS[i,j,k]
>>> end
>>>
>>>
>>>
>>>
>>> On Saturday, April 25, 2015 at 7:57:01 PM UTC+2, Johan Sigfrids wrote:
>>>>
>>>> I think it is all the slicing that is killing the performance. Maybe 
>>>> something like arrayviews or the new sub stuff on 0.4 would help. 
>>>> Alternatively devectorizing into a bunch of nested loops.
>>>>
>>>> On Saturday, April 25, 2015 at 8:42:09 PM UTC+3, Stefan Karpinski wrote:
>>>>>
>>>>> Stick const in front of T and RHS.
>>>>>
>>>>> On Sat, Apr 25, 2015 at 11:32 AM, Tim Holy <[email protected]> wrote:
>>>>>
>>>>>> Did you read through
>>>>>> http://docs.julialang.org/en/release-0.3/manual/performance-tips/? 
>>>>>> You should
>>>>>> memorize :-) the sections up through the Tools section; the rest you 
>>>>>> can
>>>>>> consult as you discover you need them.
>>>>>>
>>>>>> --Tim
>>>>>>
>>>>>> On Saturday, April 25, 2015 01:03:38 AM Ángel de Vicente wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > a complete Julia newbie here... I spent a couple of days learning 
>>>>>> the
>>>>>> > syntax and main aspects of Julia, and since I heard many good 
>>>>>> things about
>>>>>> > it, I decided to try a little program to see how it compares 
>>>>>> against the
>>>>>> > other ones I regularly use: Fortran and Python.
>>>>>> >
>>>>>> > I wrote a minimal program to solve the 3D heat equation in a cube of
>>>>>> > 100x100x100 points in the three languages and the time it takes to 
>>>>>> run in
>>>>>> > each one is:
>>>>>> >
>>>>>> > Fortran: ~7s
>>>>>> > Python: ~33s
>>>>>> > Julia:    ~80s
>>>>>> >
>>>>>> > The code runs for 1000 iterations, and I'm being nice to Julia, 
>>>>>> since the
>>>>>> > programs in Fortran and Python write 100 HDF5 files with the 
>>>>>> complete 100^3
>>>>>> > data (every 10 iterations).
>>>>>> >
>>>>>> > I attach the code (and you can also get it at: 
>>>>>> http://pastebin.com/y5HnbWQ1)
>>>>>> >
>>>>>> > Am I doing something obviously wrong? Any suggestions on how to 
>>>>>> improve its
>>>>>> > speed?
>>>>>> >
>>>>>> > Thanks a lot,
>>>>>> > Ángel de Vicente
>>>>>>
>>>>>>
>>>>>

Reply via email to