[julia-users] Re: Passing data through Optim

Kristoffer Carlsson Tue, 29 Sep 2015 00:19:02 -0700

In my experience a closure is much faster than a function that accesses 
global variables even if it is passed to another function as an argument.


Two different examples. Here we pass a function that accesses (non const) 
globals to another function:

a = 3

g(f::Function, b, N) = f(b, N)

function f_glob(b, N)
    s = 0
    for i = 1:N
        s += a
    end
    return s
end

@time g(f_glob, 3, 10^6)

3000000

0.020655 seconds (999.84 k allocations: 15.256 MB, 18.40% gc time)


Here we instead use an outer function that generates the desired closure 
and pass the closure as the function argument:

function f_clos(b, N, a)
    function f_inner(b, N)
        s = 0
        for i = 1:N
            s += a
        end
        return s
    end
    
    g(f_inner)
end

@time f_clos(3, 10^6, 2)

2000000

0.000021 seconds (19 allocations: 928 bytes)




On Tuesday, September 29, 2015 at 8:47:06 AM UTC+2, Tomas Lycken wrote:
>
> The performance penalty from global variables comes from the fact that 
> non-const global variables are type unstable, which means that the compiler 
> has to generate very defensive, and thus slow, code. The same thing is, for 
> now, true also for anonymous functions and functions passed as parameters 
> to other functions, so switching one paradigm for the other will 
> unfortunately not help much for performance. I haven't looked closely at 
> your code, but the [FastAnonymous.jl](
> https://github.com/timholy/FastAnonymous.jl) package can probably help 
> with making the anonymous function version faster.
>
> If the data you want to pass along doesn't change, using *const* globals 
> is fine (at least from a performance perspective, you might have other 
> reasons to avoid them too...), i.e. you can do the following and it will 
> also be faster:
>
> ```
> const data = 3
>
> function model(x)
>     # use data somehow
> end
>
> # use the model function in your optimization
> ```
>
> If you need the data to change, you can still use a `const` global 
> variable, but set it to some mutable (and type-stable!) object, for example
>
> ```
> type MyData
>     data::Int
> end
>
> const data = MyData(3)
>
> function model(x)
>     data.data += 1
>     # ...
> end
>
> # use the model function in your optimization
> # it can now modify the global data on each invocation
> ```
>
> This doesn't necessarily provide the cleanest interface, but it should be 
> more performant than your current solution.
>
> // T
>     
>
> On Monday, September 28, 2015 at 6:53:22 PM UTC+2, Christopher Fisher 
> wrote:
>>
>> Thanks for your willingness to investigate the matter more closely. I 
>> cannot post the exact code I am using (besides its rather long). However, I 
>> posted a toy example that follows the same basic operations. Essentially, 
>> my code involves an outer function (SubLoop) that loops through a data set 
>> with multiple subjects. The model is fit to each subject's data. The other 
>> function (LogLike) computes the log likelihood and is called by optimize. 
>> The first set of code corresponds to the closure method and the second set 
>> of code corresponds to the global variable method. In both cases, the code 
>> executed in about .85 seconds over several runs on my computer and has 
>> about 1.9% gc time. I'm also aware that my code is probably not optimized 
>> in other regards. So I would be receptive to any other advice you might 
>> have. 
>>
>>
>>  
>>
>> using Distributions,Optim
>>
>> function SubLoop1(data1)
>>
>>     function LogLike1(parms) 
>>
>>         L = pdf(Normal(parms[1],exp(parms[2])),SubData)
>>
>>         LL = -sum(log(L))
>>
>>     end
>>
>>     #Number of Subjects
>>
>>     Nsub = size(unique(data1[:,1],1),1)
>>
>>     #Initialize per subject Data
>>
>>     SubData = []
>>
>>     for i = 1:Nsub
>>
>>         idx = data1[:,1] .== i
>>
>>         SubData = data1[idx,2]
>>
>>         parms0 = [1.0;1.0]
>>
>>         optimize(LogLike1,parms0,method=:nelder_mead)
>>
>>     end
>>
>> end
>>
>>  
>>
>> N = 10^5
>>
>> #Column 1 subject index, column 2 value
>>
>> Data = zeros(N*2,2)
>>
>> for sub = 1:2
>>
>>     Data[(N*(sub-1)+1):(N*sub),:] = [sub*ones(N) rand(Normal(10,2),N)]
>>
>> end
>>
>> @time SubLoop1(Data)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Using Distributions, Optim
>>
>> function SubLoop2(data1)
>>
>>     global SubData
>>
>>     #Number of subjects
>>
>>     Nsub = size(unique(data1[:,1],1),1)
>>
>>     #Initialize per subject data
>>
>>     SubData = []
>>
>>     for i = 1:Nsub
>>
>>         idx = data1[:,1] .== i
>>
>>         SubData = data1[idx,2]
>>
>>         parms0 = [1.0;1.0]
>>
>>         optimize(LogLike2,parms0,method=:nelder_mead)
>>
>>     end
>>
>> end
>>
>>  
>>
>> function LogLike2(parms) 
>>
>>     L = pdf(Normal(parms[1],exp(parms[2])),SubData)
>>
>>     LL = -sum(log(L))
>>
>> end
>>
>>  
>>
>> N = 10^5
>>
>> #Column 1 subject index, column 2 value
>>
>> Data = zeros(N*2,2)
>>
>> for sub = 1:2
>>
>>     Data[(N*(sub-1)+1):(N*sub),:] = [sub*ones(N) rand(Normal(10,2),N)]
>>
>> end
>>
>> @time SubLoop2(Data)
>>
>>
>>
>> On Monday, September 28, 2015 at 11:24:13 AM UTC-4, Kristoffer Carlsson 
>> wrote:
>>>
>>> From only that comment alone it is hard to give any further advice. 
>>>
>>> What overhead are you seeing?
>>>
>>> Posting runnable code is the best way to get help.
>>>
>>>

[julia-users] Re: Passing data through Optim

Reply via email to