If you're using Julia 0.4 you can also use call overloading, which is 
almost as convenient as closures and as fast as const globals. An extension 
of Tomas's benchmark 
<https://gist.github.com/afniedermayer/58ca3ba2c418617d85b4> gives me this:

Non-const global
  0.027968 seconds (999.84 k allocations: 15.256 MB)
Const global
  0.000003 seconds (6 allocations: 192 bytes)
Closure
  0.000015 seconds (19 allocations: 928 bytes)
Call Overloading
  0.000003 seconds (6 allocations: 192 bytes)

I'm not sure whether the current version of Optim.jl and other libraries 
already support callbacks with call overloading.

On Tuesday, September 29, 2015 at 11:11:42 AM UTC+2, Tomas Lycken wrote:
>
> Kristoffer, your example still uses a *non-const* global. Using this 
> benchmark <https://gist.github.com/tlycken/f0dc12b7c211f73f6cc1>, I get 
> the following results:
>
> Non-const global
>   0.017629 seconds (999.84 k allocations: 15.256 MB)
> Const global
>   0.000002 seconds (6 allocations: 192 bytes)
> Closure
>   0.000008 seconds (19 allocations: 928 bytes)
>
> So yes, a closure is much better than using a non-const global, but using 
> a const global still beats it. However, both are fast enough that it 
> probably won’t be the bottleneck in your application anymore :)
>
> Although I can by no means read and understand it, it is also instructive 
> to look at the dumps from @code_llvm to get a picture of the difference 
> in code complexity:
>
> Const global:
>
> julia> @code_llvm f_const_glob(10^5)
>
> define i64 @julia_f_const_glob_2617(i64) {
> top:
>   %1 = icmp slt i64 %0, 1
>   br i1 %1, label %L3, label %L.preheader
>
> L.preheader:                                      ; preds = %top
>   %2 = icmp sgt i64 %0, 0
>   %.op = mul i64 %0, 3
>   %3 = select i1 %2, i64 %.op, i64 0
>   br label %L3
>
> L3:                                               ; preds = %L.preheader, %top
>   %s.1 = phi i64 [ 0, %top ], [ %3, %L.preheader ]
>   ret i64 %s.1
> }
>
> Closure:
>
> julia> @code_llvm f_clos(10^5, 3)
>
> define %jl_value_t* @julia_f_clos_2616(i64, i64) {
> top:
>   %2 = alloca [5 x %jl_value_t*], align 8
>   %.sub = getelementptr inbounds [5 x %jl_value_t*]* %2, i64 0, i64 0
>   %3 = getelementptr [5 x %jl_value_t*]* %2, i64 0, i64 2
>   %4 = getelementptr [5 x %jl_value_t*]* %2, i64 0, i64 3
>   store %jl_value_t* inttoptr (i64 6 to %jl_value_t*), %jl_value_t** %.sub, 
> align 8
>   %5 = getelementptr [5 x %jl_value_t*]* %2, i64 0, i64 1
>   %6 = load %jl_value_t*** @jl_pgcstack, align 8
>   %.c = bitcast %jl_value_t** %6 to %jl_value_t*
>   store %jl_value_t* %.c, %jl_value_t** %5, align 8
>   store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8
>   store %jl_value_t* null, %jl_value_t** %3, align 8
>   store %jl_value_t* null, %jl_value_t** %4, align 8
>   %7 = getelementptr [5 x %jl_value_t*]* %2, i64 0, i64 4
>   store %jl_value_t* null, %jl_value_t** %7, align 8
>   %8 = load %jl_value_t** inttoptr (i64 2147534600 to %jl_value_t**), align 8
>   store %jl_value_t* %8, %jl_value_t** %4, align 8
>   %9 = load %jl_value_t** inttoptr (i64 2164502072 to %jl_value_t**), align 8
>   store %jl_value_t* %9, %jl_value_t** %7, align 8
>   %10 = call %jl_value_t* @jl_f_instantiate_type(%jl_value_t* null, 
> %jl_value_t** %4, i32 2)
>   store %jl_value_t* %10, %jl_value_t** %4, align 8
>   %11 = call %jl_value_t* @jl_f_svec(%jl_value_t* null, %jl_value_t** null, 
> i32 0)
>   store %jl_value_t* %11, %jl_value_t** %7, align 8
>   %12 = call %jl_value_t* @jl_f_svec(%jl_value_t* null, %jl_value_t** %4, i32 
> 2)
>   store %jl_value_t* %12, %jl_value_t** %4, align 8
>   %13 = call %jl_value_t* @jl_box_int64(i64 signext %1)
>   store %jl_value_t* %13, %jl_value_t** %7, align 8
>   %14 = call %jl_value_t* (i64, ...)* @jl_svec(i64 1, %jl_value_t* %13)
>   store %jl_value_t* %14, %jl_value_t** %7, align 8
>   %15 = call %jl_value_t* @jl_new_closure(i8* null, %jl_value_t* %14, 
> %jl_value_t* inttoptr (i64 2221717744 to %jl_value
> _t*))
>   store %jl_value_t* %15, %jl_value_t** %7, align 8
>   %16 = load %jl_value_t** @jl_false, align 8
>   %17 = call %jl_value_t* @jl_method_def(%jl_value_t* inttoptr (i64 507171808 
> to %jl_value_t*), %jl_value_t** %3, %jl_va
> lue_t* null, %jl_value_t* null, %jl_value_t* %12, %jl_value_t* %15, 
> %jl_value_t* %16, %jl_value_t* inttoptr (i64 2179532
> 144 to %jl_value_t*), i32 0)
>   %18 = load %jl_value_t** %3, align 8
>   %19 = bitcast %jl_value_t* %18 to %jl_value_t* (%jl_value_t*, 
> %jl_value_t**, i32)**
>   %20 = load %jl_value_t* (%jl_value_t*, %jl_value_t**, i32)** %19, align 8
>   %21 = call %jl_value_t* @jl_box_int64(i64 signext %0)
>   store %jl_value_t* %21, %jl_value_t** %4, align 8
>   %22 = call %jl_value_t* %20(%jl_value_t* %18, %jl_value_t** %4, i32 1)
>   %23 = load %jl_value_t** %5, align 8
>   %24 = getelementptr inbounds %jl_value_t* %23, i64 0, i32 0
>   store %jl_value_t** %24, %jl_value_t*** @jl_pgcstack, align 8
>   ret %jl_value_t* %22
> }
>
> On Tuesday, September 29, 2015 at 9:18:29 AM UTC+2, Kristoffer Carlsson 
> wrote:
>
> In my experience a closure is much faster than a function that accesses 
>> global variables even if it is passed to another function as an argument.
>>
>> Two different examples. Here we pass a function that accesses (non const) 
>> globals to another function:
>>
>> a = 3
>>
>> g(f::Function, b, N) = f(b, N)
>>
>> function f_glob(b, N)
>>     s = 0
>>     for i = 1:N
>>         s += a
>>     end
>>     return s
>> end
>>
>> @time g(f_glob, 3, 10^6)
>>
>> 3000000
>>
>> 0.020655 seconds (999.84 k allocations: 15.256 MB, 18.40% gc time)
>>
>>
>> Here we instead use an outer function that generates the desired closure 
>> and pass the closure as the function argument:
>>
>> function f_clos(b, N, a)
>>     function f_inner(b, N)
>>         s = 0
>>         for i = 1:N
>>             s += a
>>         end
>>         return s
>>     end
>>     
>>     g(f_inner)
>> end
>>
>> @time f_clos(3, 10^6, 2)
>>
>> 2000000
>>
>> 0.000021 seconds (19 allocations: 928 bytes)
>>
>>
>>
>>
>> On Tuesday, September 29, 2015 at 8:47:06 AM UTC+2, Tomas Lycken wrote:
>>>
>>> The performance penalty from global variables comes from the fact that 
>>> non-const global variables are type unstable, which means that the compiler 
>>> has to generate very defensive, and thus slow, code. The same thing is, for 
>>> now, true also for anonymous functions and functions passed as parameters 
>>> to other functions, so switching one paradigm for the other will 
>>> unfortunately not help much for performance. I haven't looked closely at 
>>> your code, but the [FastAnonymous.jl](
>>> https://github.com/timholy/FastAnonymous.jl) package can probably help 
>>> with making the anonymous function version faster.
>>>
>>> If the data you want to pass along doesn't change, using *const* globals 
>>> is fine (at least from a performance perspective, you might have other 
>>> reasons to avoid them too...), i.e. you can do the following and it will 
>>> also be faster:
>>>
>>> ```
>>> const data = 3
>>>
>>> function model(x)
>>>     # use data somehow
>>> end
>>>
>>> # use the model function in your optimization
>>> ```
>>>
>>> If you need the data to change, you can still use a `const` global 
>>> variable, but set it to some mutable (and type-stable!) object, for example
>>>
>>> ```
>>> type MyData
>>>     data::Int
>>> end
>>>
>>> const data = MyData(3)
>>>
>>> function model(x)
>>>     data.data += 1
>>>     # ...
>>> end
>>>
>>> # use the model function in your optimization
>>> # it can now modify the global data on each invocation
>>> ```
>>>
>>> This doesn't necessarily provide the cleanest interface, but it should 
>>> be more performant than your current solution.
>>>
>>> // T
>>>     
>>>
>>> On Monday, September 28, 2015 at 6:53:22 PM UTC+2, Christopher Fisher 
>>> wrote:
>>>>
>>>> Thanks for your willingness to investigate the matter more closely. I 
>>>> cannot post the exact code I am using (besides its rather long). However, 
>>>> I 
>>>> posted a toy example that follows the same basic operations. Essentially, 
>>>> my code involves an outer function (SubLoop) that loops through a data set 
>>>> with multiple subjects. The model is fit to each subject's data. The other 
>>>> function (LogLike) computes the log likelihood and is called by optimize. 
>>>> The first set of code corresponds to the closure method and the second set 
>>>> of code corresponds to the global variable method. In both cases, the code 
>>>> executed in about .85 seconds over several runs on my computer and has 
>>>> about 1.9% gc time. I'm also aware that my code is probably not optimized 
>>>> in other regards. So I would be receptive to any other advice you might 
>>>> have. 
>>>>
>>>>
>>>>  
>>>>
>>>> using Distributions,Optim
>>>>
>>>> function SubLoop1(data1)
>>>>
>>>>     function LogLike1(parms) 
>>>>
>>>>         L = pdf(Normal(parms[1],exp(parms[2])),SubData)
>>>>
>>>>         LL = -sum(log(L))
>>>>
>>>>     end
>>>>
>>>>     #Number of Subjects
>>>>
>>>>     Nsub = size(unique(data1[:,1],1),1)
>>>>
>>>>     #Initialize per subject Data
>>>>
>>>>     SubData = []
>>>>
>>>>     for i = 1:Nsub
>>>>
>>>>         idx = data1[:,1] .== i
>>>>
>>>>         SubData = data1[idx,2]
>>>>
>>>>         parms0 = [1.0;1.0]
>>>>
>>>>         optimize(LogLike1,parms0,method=:nelder_mead)
>>>>
>>>>     end
>>>>
>>>> end
>>>>
>>>>  
>>>>
>>>> N = 10^5
>>>>
>>>> #Column 1 subject index, column 2 value
>>>>
>>>> Data = zeros(N*2,2)
>>>>
>>>> for sub = 1:2
>>>>
>>>>     Data[(N*(sub-1)+1):(N*sub),:] = [sub*ones(N) rand(Normal(10,2),N)]
>>>>
>>>> end
>>>>
>>>> @time SubLoop1(Data)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Using Distributions, Optim
>>>>
>>>> function SubLoop2(data1)
>>>>
>>>>     global SubData
>>>>
>>>>     #Number of subjects
>>>>
>>>>     Nsub = size(unique(data1[:,1],1),1)
>>>>
>>>>     #Initialize per subject data
>>>>
>>>>     SubData = []
>>>>
>>>>     for i = 1:Nsub
>>>>
>>>>         idx = data1[:,1] .== i
>>>>
>>>>         SubData = data1[idx,2]
>>>>
>>>>         parms0 = [1.0;1.0]
>>>>
>>>>         optimize(LogLike2,parms0,method=:nelder_mead)
>>>>
>>>>     end
>>>>
>>>> end
>>>>
>>>>  
>>>>
>>>> function LogLike2(parms) 
>>>>
>>>>     L = pdf(Normal(parms[1],exp(parms[2])),SubData)
>>>>
>>>>     LL = -sum(log(L))
>>>>
>>>> end
>>>>
>>>>  
>>>>
>>>> N = 10^5
>>>>
>>>> #Column 1 subject index, column 2 value
>>>>
>>>> Data = zeros(N*2,2)
>>>>
>>>> for sub = 1:2
>>>>
>>>>     Data[(N*(sub-1)+1):(N*sub),:] = [sub*ones(N) rand(Normal(10,2),N)]
>>>>
>>>> end
>>>>
>>>> @time SubLoop2(Data)
>>>>
>>>>
>>>>
>>>> On Monday, September 28, 2015 at 11:24:13 AM UTC-4, Kristoffer Carlsson 
>>>> wrote:
>>>>>
>>>>> From only that comment alone it is hard to give any further advice. 
>>>>>
>>>>> What overhead are you seeing?
>>>>>
>>>>> Posting runnable code is the best way to get help.
>>>>>
>>>>> ​
>

Reply via email to