[julia-users] Re: Passing data through Optim

Tomas Lycken Tue, 29 Sep 2015 02:12:09 -0700


Kristoffer, your example still uses a *non-const* global. Using this 
benchmark <https://gist.github.com/tlycken/f0dc12b7c211f73f6cc1>, I get the 
following results:


Non-const global
  0.017629 seconds (999.84 k allocations: 15.256 MB)
Const global
  0.000002 seconds (6 allocations: 192 bytes)
Closure
  0.000008 seconds (19 allocations: 928 bytes)

So yes, a closure is much better than using a non-const global, but using a 
const global still beats it. However, both are fast enough that it probably 
won’t be the bottleneck in your application anymore :)

Although I can by no means read and understand it, it is also instructive 
to look at the dumps from @code_llvm to get a picture of the difference in 
code complexity:

Const global:

julia> @code_llvm f_const_glob(10^5)

define i64 @julia_f_const_glob_2617(i64) {
top:
  %1 = icmp slt i64 %0, 1
  br i1 %1, label %L3, label %L.preheader

L.preheader:                                      ; preds = %top
  %2 = icmp sgt i64 %0, 0
  %.op = mul i64 %0, 3
  %3 = select i1 %2, i64 %.op, i64 0
  br label %L3

L3:                                               ; preds = %L.preheader, %top
  %s.1 = phi i64 [ 0, %top ], [ %3, %L.preheader ]
  ret i64 %s.1
}

Closure:

julia> @code_llvm f_clos(10^5, 3)

define %jl_value_t* @julia_f_clos_2616(i64, i64) {
top:
  %2 = alloca [5 x %jl_value_t*], align 8
  %.sub = getelementptr inbounds [5 x %jl_value_t*]* %2, i64 0, i64 0
  %3 = getelementptr [5 x %jl_value_t*]* %2, i64 0, i64 2
  %4 = getelementptr [5 x %jl_value_t*]* %2, i64 0, i64 3
  store %jl_value_t* inttoptr (i64 6 to %jl_value_t*), %jl_value_t** %.sub, 
align 8
  %5 = getelementptr [5 x %jl_value_t*]* %2, i64 0, i64 1
  %6 = load %jl_value_t*** @jl_pgcstack, align 8
  %.c = bitcast %jl_value_t** %6 to %jl_value_t*
  store %jl_value_t* %.c, %jl_value_t** %5, align 8
  store %jl_value_t** %.sub, %jl_value_t*** @jl_pgcstack, align 8
  store %jl_value_t* null, %jl_value_t** %3, align 8
  store %jl_value_t* null, %jl_value_t** %4, align 8
  %7 = getelementptr [5 x %jl_value_t*]* %2, i64 0, i64 4
  store %jl_value_t* null, %jl_value_t** %7, align 8
  %8 = load %jl_value_t** inttoptr (i64 2147534600 to %jl_value_t**), align 8
  store %jl_value_t* %8, %jl_value_t** %4, align 8
  %9 = load %jl_value_t** inttoptr (i64 2164502072 to %jl_value_t**), align 8
  store %jl_value_t* %9, %jl_value_t** %7, align 8
  %10 = call %jl_value_t* @jl_f_instantiate_type(%jl_value_t* null, 
%jl_value_t** %4, i32 2)
  store %jl_value_t* %10, %jl_value_t** %4, align 8
  %11 = call %jl_value_t* @jl_f_svec(%jl_value_t* null, %jl_value_t** null, i32 
0)
  store %jl_value_t* %11, %jl_value_t** %7, align 8
  %12 = call %jl_value_t* @jl_f_svec(%jl_value_t* null, %jl_value_t** %4, i32 2)
  store %jl_value_t* %12, %jl_value_t** %4, align 8
  %13 = call %jl_value_t* @jl_box_int64(i64 signext %1)
  store %jl_value_t* %13, %jl_value_t** %7, align 8
  %14 = call %jl_value_t* (i64, ...)* @jl_svec(i64 1, %jl_value_t* %13)
  store %jl_value_t* %14, %jl_value_t** %7, align 8
  %15 = call %jl_value_t* @jl_new_closure(i8* null, %jl_value_t* %14, 
%jl_value_t* inttoptr (i64 2221717744 to %jl_value
_t*))
  store %jl_value_t* %15, %jl_value_t** %7, align 8
  %16 = load %jl_value_t** @jl_false, align 8
  %17 = call %jl_value_t* @jl_method_def(%jl_value_t* inttoptr (i64 507171808 
to %jl_value_t*), %jl_value_t** %3, %jl_va
lue_t* null, %jl_value_t* null, %jl_value_t* %12, %jl_value_t* %15, 
%jl_value_t* %16, %jl_value_t* inttoptr (i64 2179532
144 to %jl_value_t*), i32 0)
  %18 = load %jl_value_t** %3, align 8
  %19 = bitcast %jl_value_t* %18 to %jl_value_t* (%jl_value_t*, %jl_value_t**, 
i32)**
  %20 = load %jl_value_t* (%jl_value_t*, %jl_value_t**, i32)** %19, align 8
  %21 = call %jl_value_t* @jl_box_int64(i64 signext %0)
  store %jl_value_t* %21, %jl_value_t** %4, align 8
  %22 = call %jl_value_t* %20(%jl_value_t* %18, %jl_value_t** %4, i32 1)
  %23 = load %jl_value_t** %5, align 8
  %24 = getelementptr inbounds %jl_value_t* %23, i64 0, i32 0
  store %jl_value_t** %24, %jl_value_t*** @jl_pgcstack, align 8
  ret %jl_value_t* %22
}

On Tuesday, September 29, 2015 at 9:18:29 AM UTC+2, Kristoffer Carlsson 
wrote:

In my experience a closure is much faster than a function that accesses 
> global variables even if it is passed to another function as an argument.
>
> Two different examples. Here we pass a function that accesses (non const) 
> globals to another function:
>
> a = 3
>
> g(f::Function, b, N) = f(b, N)
>
> function f_glob(b, N)
>     s = 0
>     for i = 1:N
>         s += a
>     end
>     return s
> end
>
> @time g(f_glob, 3, 10^6)
>
> 3000000
>
> 0.020655 seconds (999.84 k allocations: 15.256 MB, 18.40% gc time)
>
>
> Here we instead use an outer function that generates the desired closure 
> and pass the closure as the function argument:
>
> function f_clos(b, N, a)
>     function f_inner(b, N)
>         s = 0
>         for i = 1:N
>             s += a
>         end
>         return s
>     end
>     
>     g(f_inner)
> end
>
> @time f_clos(3, 10^6, 2)
>
> 2000000
>
> 0.000021 seconds (19 allocations: 928 bytes)
>
>
>
>
> On Tuesday, September 29, 2015 at 8:47:06 AM UTC+2, Tomas Lycken wrote:
>>
>> The performance penalty from global variables comes from the fact that 
>> non-const global variables are type unstable, which means that the compiler 
>> has to generate very defensive, and thus slow, code. The same thing is, for 
>> now, true also for anonymous functions and functions passed as parameters 
>> to other functions, so switching one paradigm for the other will 
>> unfortunately not help much for performance. I haven't looked closely at 
>> your code, but the [FastAnonymous.jl](
>> https://github.com/timholy/FastAnonymous.jl) package can probably help 
>> with making the anonymous function version faster.
>>
>> If the data you want to pass along doesn't change, using *const* globals 
>> is fine (at least from a performance perspective, you might have other 
>> reasons to avoid them too...), i.e. you can do the following and it will 
>> also be faster:
>>
>> ```
>> const data = 3
>>
>> function model(x)
>>     # use data somehow
>> end
>>
>> # use the model function in your optimization
>> ```
>>
>> If you need the data to change, you can still use a `const` global 
>> variable, but set it to some mutable (and type-stable!) object, for example
>>
>> ```
>> type MyData
>>     data::Int
>> end
>>
>> const data = MyData(3)
>>
>> function model(x)
>>     data.data += 1
>>     # ...
>> end
>>
>> # use the model function in your optimization
>> # it can now modify the global data on each invocation
>> ```
>>
>> This doesn't necessarily provide the cleanest interface, but it should be 
>> more performant than your current solution.
>>
>> // T
>>     
>>
>> On Monday, September 28, 2015 at 6:53:22 PM UTC+2, Christopher Fisher 
>> wrote:
>>>
>>> Thanks for your willingness to investigate the matter more closely. I 
>>> cannot post the exact code I am using (besides its rather long). However, I 
>>> posted a toy example that follows the same basic operations. Essentially, 
>>> my code involves an outer function (SubLoop) that loops through a data set 
>>> with multiple subjects. The model is fit to each subject's data. The other 
>>> function (LogLike) computes the log likelihood and is called by optimize. 
>>> The first set of code corresponds to the closure method and the second set 
>>> of code corresponds to the global variable method. In both cases, the code 
>>> executed in about .85 seconds over several runs on my computer and has 
>>> about 1.9% gc time. I'm also aware that my code is probably not optimized 
>>> in other regards. So I would be receptive to any other advice you might 
>>> have. 
>>>
>>>
>>>  
>>>
>>> using Distributions,Optim
>>>
>>> function SubLoop1(data1)
>>>
>>>     function LogLike1(parms) 
>>>
>>>         L = pdf(Normal(parms[1],exp(parms[2])),SubData)
>>>
>>>         LL = -sum(log(L))
>>>
>>>     end
>>>
>>>     #Number of Subjects
>>>
>>>     Nsub = size(unique(data1[:,1],1),1)
>>>
>>>     #Initialize per subject Data
>>>
>>>     SubData = []
>>>
>>>     for i = 1:Nsub
>>>
>>>         idx = data1[:,1] .== i
>>>
>>>         SubData = data1[idx,2]
>>>
>>>         parms0 = [1.0;1.0]
>>>
>>>         optimize(LogLike1,parms0,method=:nelder_mead)
>>>
>>>     end
>>>
>>> end
>>>
>>>  
>>>
>>> N = 10^5
>>>
>>> #Column 1 subject index, column 2 value
>>>
>>> Data = zeros(N*2,2)
>>>
>>> for sub = 1:2
>>>
>>>     Data[(N*(sub-1)+1):(N*sub),:] = [sub*ones(N) rand(Normal(10,2),N)]
>>>
>>> end
>>>
>>> @time SubLoop1(Data)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Using Distributions, Optim
>>>
>>> function SubLoop2(data1)
>>>
>>>     global SubData
>>>
>>>     #Number of subjects
>>>
>>>     Nsub = size(unique(data1[:,1],1),1)
>>>
>>>     #Initialize per subject data
>>>
>>>     SubData = []
>>>
>>>     for i = 1:Nsub
>>>
>>>         idx = data1[:,1] .== i
>>>
>>>         SubData = data1[idx,2]
>>>
>>>         parms0 = [1.0;1.0]
>>>
>>>         optimize(LogLike2,parms0,method=:nelder_mead)
>>>
>>>     end
>>>
>>> end
>>>
>>>  
>>>
>>> function LogLike2(parms) 
>>>
>>>     L = pdf(Normal(parms[1],exp(parms[2])),SubData)
>>>
>>>     LL = -sum(log(L))
>>>
>>> end
>>>
>>>  
>>>
>>> N = 10^5
>>>
>>> #Column 1 subject index, column 2 value
>>>
>>> Data = zeros(N*2,2)
>>>
>>> for sub = 1:2
>>>
>>>     Data[(N*(sub-1)+1):(N*sub),:] = [sub*ones(N) rand(Normal(10,2),N)]
>>>
>>> end
>>>
>>> @time SubLoop2(Data)
>>>
>>>
>>>
>>> On Monday, September 28, 2015 at 11:24:13 AM UTC-4, Kristoffer Carlsson 
>>> wrote:
>>>>
>>>> From only that comment alone it is hard to give any further advice. 
>>>>
>>>> What overhead are you seeing?
>>>>
>>>> Posting runnable code is the best way to get help.
>>>>
>>>>

[julia-users] Re: Passing data through Optim

Reply via email to