Re: [julia-users] General help with concepts: splatting/slurping, inlining, tuple access, function chaining

Matt Bauman Wed, 11 Nov 2015 07:26:57 -0800

These are definitely not stupid questions.  Tuning performance at this 
level is very hard.  I'll just add a few practical notes on 
micro-benchmarking and introspection.  Inspecting and benchmarking 
splatted/slurped functions can be very tricky.  One thing to remember is 
that `@code_llvm` displays the code for the method returned by `@which`. 
 This means that if you try to look at the generated code for a slurped 
function definition, you won't see the fully specialized case.  Let's 
modify Stefan's example above slightly:


julia> g2(a, bs...) = +(a, bs...)
g2 (generic function with 1 method)

julia> @code_llvm g2(1,2,3)
define %jl_value_t* @julia_g2_22749(%jl_value_t*, %jl_value_t**, i32) {
# … long, with a GC frame and tuple unpacking

This is simply because we're *not* looking at the code for three-argument 
`g2`!  We're looking at the general variadic case where the number of 
arguments is unknown:

julia> @which g2(1,2,3)
g2(a, bs...) at none:1

If, however, we put this into an outer function with a set number of 
arguments, we can force julia to show us the code as it would compile it 
when the number of arguments is known:

julia> h(a,b,c) = g2(a,b,c)
h (generic function with 2 methods)

julia> @code_llvm h(1,2,3)
define i64 @julia_h_22752(i64, i64, i64) {
top:
  %3 = add i64 %1, %0
  %4 = add i64 %3, %2
  ret i64 %4
}

Finally, benchmarking such tiny things with `@time` can be tricky, which is 
why looking at the generated LLVM is very helpful.  A few of us are working 
on a Benchmarking package that should give much more stable and robust 
results (with confidence intervals) at 
https://github.com/johnmyleswhite/Benchmarks.jl.  I'm delinquent on some 
documentation on how the `@benchmark` macro works, but here's a quick 
rundown: It only works on function calls, and, unlike `@time`, it evaluates 
all the arguments first as a setup step. It benchmarks the function by 
calling it repeatedly with the same arguments over and over until it has a 
good estimate for the performance, taking care to ensure that LLVM doesn't 
optimize the benchmarking loop away or do unwanted constant-folding 
operations.

Matt

On Wednesday, November 11, 2015 at 9:40:28 AM UTC-5, Stefan Karpinski wrote:
>
> On Wed, Nov 11, 2015 at 1:29 AM, 'Greg Plowman' via julia-users <
> [email protected] <javascript:>> wrote:
>
>> I have some naïve and probably stupid questions about writing efficient 
>> code in Julia.
>> I almost didn't post this because I'm not sure if this is the right place 
>> for asking for this type of conceptual help. I do hope it is appropriate. I 
>> apologise in advance if it's not.
>>
>
> This is exactly the right place.
>
> *Splatting/Slurping*
>> I have read that splatting/slurping incurs a penalty. 
>> Is it the splatting or the slurping or both?
>> I have also read that this is not the case if the function is inlined. Is 
>> this true?
>>
>
> Splatting/slurping is generally free for small, predictably-sized tuples. 
> For example, doing f(a, (b,c)...) will be exactly equivalent to writing out 
> f(a, b, c). For example:
>
> julia> g(a, b, c) = +(a, (b, c)...)
> g (generic function with 1 method)
>
> julia> g(1, 2, 3)
> 6
>
> julia> @code_llvm g(1, 2, 3)
>
> define i64 @julia_g_22684(i64, i64, i64) #0 {
> top:
>   %3 = add i64 %1, %0
>   %4 = add i64 %3, %2
>   ret i64 %4
> }
>
>
> The inlining is irrelevant here: if you try this where g calls a function 
> that doesn't get inlined, it will produce the same code whether you do the 
> splatting or change the args to (a, b, c). This is an excessively simple 
> example since it's very easy to see that +(a, (b, c)...) is equivalent to 
> +(a, b, c), but since we specialized function bodies on argument types, 
> which often includes the size and component types of tuple arguments, what 
> type inference is actually reasoning about is often just this simple.
>
> What do you really want to avoid is splatting dynamically sized 
> collections as the arguments. For example, you can sum the values in a 
> vector like this:
>
> julia> v = rand(1000);
>
> julia> @time +(v...)
>   0.077058 seconds (39.84 k allocations: 1.930 MB)
> 509.51187339334575
>
> julia> @time +(v...)
>   0.000131 seconds (2.01 k allocations: 71.016 KB)
> 509.51187339334575
>
>
> But this is really bad. Why? Because Julia dispatches function calls on 
> all of a functions arguments. In this case there are 1000 arguments, all of 
> which, in principle are being used for dispatch. Not good. So if you ever 
> find yourself splatting a variably sized collection into a function's 
> arguments, stop and think that there must be a reducer that does this. In 
> this case, it's the sum function:
>
> julia> @time sum(v)
>   0.000005 seconds (5 allocations: 176 bytes)
> 509.5118733933458
>
>  
> As a bonus, our sum function is also more accurate since it uses a better 
> summation algorithm than the naive left-to-right accumulation approach.
>
> *Inlining*
>> How do I know if a function is inlined? 
>> When is it necessary to explicitly inline? (say with @inline, or 
>> Exrp(:meta, :inline)) 
>> Does this guarantee inlining, or is it just a hint to the compiler?
>> Is the compiler usually smart enough to inline optimally.
>> Why wouldn't I explicitly inline all my small functions?
>>
>
> You can tell if something was inlined by looking at the body of the 
> calling method and seeing if there's an explicit method calls or not. For 
> example the output of @code_llvm g(1, 2, 3) above, there are no calls to 
> any + methods – the addition operations code fully inlined. If, on the 
> other hand, you look at @code_llvm g(big(1), big(2), big(3)), you'll see 
> that there are explicit calls to Julia methods with names 
> like @"julia_+_22861".
>
> You generally don't want or need to mess with this. The compiler should be 
> and generally is good at figuring out what should or shouldn't be inlined. 
> However, there are some times when a very small method really ought to 
> always be inlined, in which case, you can annotated the method definition 
> with @inline. You can't always force inlining – because it's not always 
> possible – but when it is, this skips the heuristic checks that normally 
> happen and just does it. Beware, this can make things slower and can bloat 
> generated code.
>  
>
>> *Overhead of accessing tuples compared to separate variables/arguments*
>> Is there overhead in accessing tuples vs separate arguments?
>> Is the expression assigned to x slower than expression assigned to y?
>> I = (1,2,3)
>> i1 = 1, i2 = 2, i3 = 3
>> x = I[1]+I[2]+I[3]
>> y = i1+i2+i3
>>
>
> In general no, there is no overhead. Caveats, the tuples must be 
> reasonably sized (if they are thousands of elements long, this will not be 
> fast), and you should try not to do something very confusing. Type 
> inference specializes function bodies on arguments, but if it gets called 
> with very long tuples or lots of different tuple types, it will bail out 
> and just use generic code, which will be slower.
>  
>
>> *"Chained" function calls *(not sure if chained is the right word)It 
>> seems sometimes I have lots of small "convenience" functions, effectively 
>> chained until a final "working" function is called.
>> A calls B calls C calls D calls ... calls Z. If A, B, C, ... are small 
>> (maybe getters, convenience functions etc), is this still efficient?
>> Is there any penalty for this?
>> How does this relate to inlining?
>> Presumably there is no penalty if and only if functions are inlined? Is 
>> this true?
>> If so, is there a limit to the number of levels (functions in call chain) 
>> that can be inlined?
>> Should I be worried about any of this?
>>
>
> Function chaining is cheap. If inlining occurs, it's free. Even if 
> inlining doesn't occur, function calls are very cheap on modern CPUs. 
> Calling functions can be cheaper than inlining, so one is not uniformly 
> better than the other. Function calls are only expensive if dispatch is 
> unpredictable and generic dispatch ends up happening. You can tell this by 
> looking at the LLVM code and looking for jl_apply_generic – if that's 
> being called, that's generic dispatch. If that's happening there may also 
> be type instabilities and other badness.
>

Re: [julia-users] General help with concepts: splatting/slurping, inlining, tuple access, function chaining

Reply via email to