[julia-users] Re: help understanding different ways of wrapping functions

K leo Wed, 28 Sep 2016 05:01:21 -0700

Thanks so much for the tips.  The culprit is the keyword argument 
(xRat=0.).  Declaring it made the wrapped code twice as fast, but still way 
slower than the inline code.  But making it positional made the wrapped 
code just a little slower than the inline code - big improvement.


On Wednesday, September 28, 2016 at 2:50:40 PM UTC+8, Gunnar Farnebäck 
wrote:
>
> It's normal that manually inlined code of this kind is faster than wrapped 
> code unless the compiler manages to see the full inlining potential. In 
> this case the huge memory allocations for the wrapped solutions indicates 
> that it's nowhere near doing that at all. I doubt it will take you all the 
> way but start with modifying your inner M_CPS function to only take 
> positional arguments or declaring the type of the keyword argument as 
> suggested in the performance tips section of the manual.
>
> Den onsdag 28 september 2016 kl. 06:29:37 UTC+2 skrev K leo:
>>
>> I tested a few different ways of wrapping functions.  It looks different 
>> ways of wrapping has slightly different costs.  But the most confusing to 
>> me is that putting everything inline looks much faster than wrapping things 
>> up.  I would understand this in other languages, but I thought Julia 
>> advocates simple wrapping.  Can anyone help explain what is happening 
>> below, and how I can do most efficient wrapping in the demo code?
>>
>> Demo code is included below.
>>
>> julia> versioninfo()
>> Julia Version 0.5.0
>> Commit 3c9d753 (2016-09-19 18:14 UTC)
>> Platform Info:
>>   System: Linux (x86_64-pc-linux-gnu)
>>   CPU: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
>>   WORD_SIZE: 64
>>   BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
>>   LAPACK: libopenblas64_
>>   LIBM: libopenlibm
>>   LLVM: libLLVM-3.7.1 (ORCJIT, broadwell)
>>
>> julia> testFunc()
>> calling LoopCP (everything inline)
>>   0.097556 seconds (2.10 k allocations: 290.625 KB)
>> elapsed time (ns): 97555896
>> bytes allocated:   297600
>> pool allocs:       2100
>> [0.0,4200.0,0.0,0.0,4200.0,4200.0,4200.0,4200.0,0.0,4200.0,4200.0]
>>
>> calling LoopCP0 (slightly wrapped)
>>   4.173830 seconds (49.78 M allocations: 2.232 GB, 5.83% gc time)
>> elapsed time (ns): 4173830495
>> gc time (ns):      243516584
>> bytes allocated:   2396838538
>> pool allocs:       49783357
>> GC pauses:         104
>> full collections:  1
>> [4200.0,0.0,4200.0,4200.0,0.0,0.0,0.0,0.0,4200.0,0.0,0.0]
>>
>> calling LoopCP1 (wrapped one way)
>>   5.274723 seconds (59.59 M allocations: 2.378 GB, 3.62% gc time)
>> elapsed time (ns): 5274722983
>> gc time (ns):      191036337
>> bytes allocated:   2553752638
>> pool allocs:       59585834
>> GC pauses:         112
>> [8400.0,0.0,8400.0,8400.0,0.0,0.0,0.0,0.0,8400.0,0.0,0.0]
>>
>> calling LoopCP2 (wrapped another way)
>>   5.212895 seconds (59.58 M allocations: 2.378 GB, 3.60% gc time)
>> elapsed time (ns): 5212894550
>> gc time (ns):      187696529
>> bytes allocated:   2553577600
>> pool allocs:       59582100
>> GC pauses:         111
>> [0.0,8400.0,0.0,0.0,8400.0,8400.0,8400.0,8400.0,0.0,8400.0,8400.0]
>>
>> const dim=1000
>>>
>>>
>>>> type Tech
>>>
>>>     a::Array{Float64,1}
>>>
>>>     c::Array{Int,1}
>>>
>>>
>>>>     function Tech()
>>>
>>>         this = new()
>>>
>>>         this.a = zeros(Float64, dim)
>>>
>>>         this.c = rand([0,1;], dim)
>>>
>>>         this
>>>
>>>     end
>>>
>>> end
>>>
>>>
>>>> function LoopCP(csign::Int, tech::Tech)
>>>
>>>     for j=1:10
>>>
>>>         for xRat in [1.:20.;]
>>>
>>>             @inbounds for i = 1:dim
>>>
>>>                 if csign == tech.c[i]
>>>
>>>                     tech.a[i] += 2.*xRat
>>>
>>>                 else
>>>
>>>                     tech.a[i] = 0.
>>>
>>>                 end
>>>
>>>             end
>>>
>>>         end #
>>>
>>>     end
>>>
>>>     nothing
>>>
>>> end
>>>
>>>
>>>> function M_CPS(i::Int, csign::Int, tech::Tech; xRat=0.)
>>>
>>>     if csign == tech.c[i]
>>>
>>>         tech.a[i] += 2.*xRat
>>>
>>>     else
>>>
>>>         tech.a[i] = 0.
>>>
>>>     end
>>>
>>>     nothing
>>>
>>> end
>>>
>>>
>>>> function LoopCP0(csign::Int, tech::Tech)
>>>
>>>     for j=1:10
>>>
>>>         for xRat in [1.:20.;]
>>>
>>>             @inbounds for i = 1:dim
>>>
>>>                 M_CPS(i, csign, tech, xRat=xRat)
>>>
>>>             end
>>>
>>>         end #
>>>
>>>     end
>>>
>>>     nothing
>>>
>>> end
>>>
>>>
>>>> function MoleculeWrapS(csign::Int, tech::Tech, molecule::Function, 
>>>> xRat=0.)
>>>
>>>     @inbounds for i = 1:dim
>>>
>>>         molecule(i, csign, tech; xRat=xRat)
>>>
>>>     end
>>>
>>>     nothing
>>>
>>> end
>>>
>>>
>>>> function LoopRunnerM1(csign::Int, tech::Tech, molecule::Function)
>>>
>>>     for j=1:10
>>>
>>>         for xRat in [1.:20.;]
>>>
>>>             MoleculeWrapS(csign, tech, molecule, xRat)
>>>
>>>         end #
>>>
>>>     end
>>>
>>>     nothing
>>>
>>> end
>>>
>>>
>>>> LoopCP1(csign::Int, tech::Tech) = LoopRunnerM1(csign, tech, M_CPS)
>>>
>>>
>>>> WrapCPS(csign::Int, tech::Tech, xRat=0.) = MoleculeWrapS(csign, tech, 
>>>> M_CPS, xRat)
>>>
>>>
>>>> function LoopRunnerM2(csign::Int, tech::Tech, loop::Function)
>>>
>>>     for j=1:10
>>>
>>>         for xRat in [1.:20.;]
>>>
>>>             loop(csign, tech, xRat)
>>>
>>>         end #
>>>
>>>     end
>>>
>>>     nothing
>>>
>>> end
>>>
>>>
>>>> LoopCP2(csign::Int, tech::Tech) = LoopRunnerM2(csign, tech, WrapCPS)
>>>
>>>
>>>> function testFunc()
>>>
>>>     tech = Tech()
>>>
>>>     nloops = 100
>>>
>>>
>>>>     println("calling LoopCP (everything inline)")
>>>
>>>     tech.a = zeros(tech.a)
>>>
>>>     @timev for i=1:nloops
>>>
>>>         LoopCP(rand([0,1]), tech)
>>>
>>>     end
>>>
>>>     println(tech.a[10:20], "\n")
>>>
>>>
>>>>     println("calling LoopCP0 (slightly wrapped)")
>>>
>>>     tech.a = zeros(tech.a)
>>>
>>>     @timev for i=1:nloops
>>>
>>>         LoopCP0(rand([0,1]), tech)
>>>
>>>     end
>>>
>>>     println(tech.a[10:20], "\n")
>>>
>>>
>>>>     println("calling LoopCP1 (wrapped one way)")
>>>
>>>     tech.a = zeros(tech.a)
>>>
>>>     @timev for i=1:nloops
>>>
>>>         LoopCP1(rand([0,1]), tech)
>>>
>>>     end
>>>
>>>     println(tech.a[10:20], "\n")
>>>
>>>
>>>>     println("calling LoopCP2 (wrapped another way)")
>>>
>>>     tech.a = zeros(tech.a)
>>>
>>>     @timev for i=1:nloops
>>>
>>>         LoopCP2(rand([0,1]), tech)
>>>
>>>     end
>>>
>>>     println(tech.a[10:20], "\n")
>>>
>>>     
>>>
>>>     nothing
>>>
>>> end
>>>
>>>
>>>

[julia-users] Re: help understanding different ways of wrapping functions

Reply via email to