In my experience a closure is much faster than a function that accesses
global variables even if it is passed to another function as an argument.
Two different examples. Here we pass a function that accesses (non const)
globals to another function:
a = 3
g(f::Function, b, N) = f(b, N)
function f_glob(b, N)
s = 0
for i = 1:N
s += a
end
return s
end
@time g(f_glob, 3, 10^6)
3000000
0.020655 seconds (999.84 k allocations: 15.256 MB, 18.40% gc time)
Here we instead use an outer function that generates the desired closure
and pass the closure as the function argument:
function f_clos(b, N, a)
function f_inner(b, N)
s = 0
for i = 1:N
s += a
end
return s
end
g(f_inner)
end
@time f_clos(3, 10^6, 2)
2000000
0.000021 seconds (19 allocations: 928 bytes)
On Tuesday, September 29, 2015 at 8:47:06 AM UTC+2, Tomas Lycken wrote:
>
> The performance penalty from global variables comes from the fact that
> non-const global variables are type unstable, which means that the compiler
> has to generate very defensive, and thus slow, code. The same thing is, for
> now, true also for anonymous functions and functions passed as parameters
> to other functions, so switching one paradigm for the other will
> unfortunately not help much for performance. I haven't looked closely at
> your code, but the [FastAnonymous.jl](
> https://github.com/timholy/FastAnonymous.jl) package can probably help
> with making the anonymous function version faster.
>
> If the data you want to pass along doesn't change, using *const* globals
> is fine (at least from a performance perspective, you might have other
> reasons to avoid them too...), i.e. you can do the following and it will
> also be faster:
>
> ```
> const data = 3
>
> function model(x)
> # use data somehow
> end
>
> # use the model function in your optimization
> ```
>
> If you need the data to change, you can still use a `const` global
> variable, but set it to some mutable (and type-stable!) object, for example
>
> ```
> type MyData
> data::Int
> end
>
> const data = MyData(3)
>
> function model(x)
> data.data += 1
> # ...
> end
>
> # use the model function in your optimization
> # it can now modify the global data on each invocation
> ```
>
> This doesn't necessarily provide the cleanest interface, but it should be
> more performant than your current solution.
>
> // T
>
>
> On Monday, September 28, 2015 at 6:53:22 PM UTC+2, Christopher Fisher
> wrote:
>>
>> Thanks for your willingness to investigate the matter more closely. I
>> cannot post the exact code I am using (besides its rather long). However, I
>> posted a toy example that follows the same basic operations. Essentially,
>> my code involves an outer function (SubLoop) that loops through a data set
>> with multiple subjects. The model is fit to each subject's data. The other
>> function (LogLike) computes the log likelihood and is called by optimize.
>> The first set of code corresponds to the closure method and the second set
>> of code corresponds to the global variable method. In both cases, the code
>> executed in about .85 seconds over several runs on my computer and has
>> about 1.9% gc time. I'm also aware that my code is probably not optimized
>> in other regards. So I would be receptive to any other advice you might
>> have.
>>
>>
>>
>>
>> using Distributions,Optim
>>
>> function SubLoop1(data1)
>>
>> function LogLike1(parms)
>>
>> L = pdf(Normal(parms[1],exp(parms[2])),SubData)
>>
>> LL = -sum(log(L))
>>
>> end
>>
>> #Number of Subjects
>>
>> Nsub = size(unique(data1[:,1],1),1)
>>
>> #Initialize per subject Data
>>
>> SubData = []
>>
>> for i = 1:Nsub
>>
>> idx = data1[:,1] .== i
>>
>> SubData = data1[idx,2]
>>
>> parms0 = [1.0;1.0]
>>
>> optimize(LogLike1,parms0,method=:nelder_mead)
>>
>> end
>>
>> end
>>
>>
>>
>> N = 10^5
>>
>> #Column 1 subject index, column 2 value
>>
>> Data = zeros(N*2,2)
>>
>> for sub = 1:2
>>
>> Data[(N*(sub-1)+1):(N*sub),:] = [sub*ones(N) rand(Normal(10,2),N)]
>>
>> end
>>
>> @time SubLoop1(Data)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Using Distributions, Optim
>>
>> function SubLoop2(data1)
>>
>> global SubData
>>
>> #Number of subjects
>>
>> Nsub = size(unique(data1[:,1],1),1)
>>
>> #Initialize per subject data
>>
>> SubData = []
>>
>> for i = 1:Nsub
>>
>> idx = data1[:,1] .== i
>>
>> SubData = data1[idx,2]
>>
>> parms0 = [1.0;1.0]
>>
>> optimize(LogLike2,parms0,method=:nelder_mead)
>>
>> end
>>
>> end
>>
>>
>>
>> function LogLike2(parms)
>>
>> L = pdf(Normal(parms[1],exp(parms[2])),SubData)
>>
>> LL = -sum(log(L))
>>
>> end
>>
>>
>>
>> N = 10^5
>>
>> #Column 1 subject index, column 2 value
>>
>> Data = zeros(N*2,2)
>>
>> for sub = 1:2
>>
>> Data[(N*(sub-1)+1):(N*sub),:] = [sub*ones(N) rand(Normal(10,2),N)]
>>
>> end
>>
>> @time SubLoop2(Data)
>>
>>
>>
>> On Monday, September 28, 2015 at 11:24:13 AM UTC-4, Kristoffer Carlsson
>> wrote:
>>>
>>> From only that comment alone it is hard to give any further advice.
>>>
>>> What overhead are you seeing?
>>>
>>> Posting runnable code is the best way to get help.
>>>
>>>