I have cooked up a minimum test case of you code that displays the problem. If
you have the following
function test1(f, n)
for i=1:n
f(i)
end
end
function test2(n)
for i=1:n
g(i)
end
end
function test3(f, n)
for i=1:n
invoke(f, (Int,), i)
end
end
g(i) = i
test1(g,10)
@time test1(g,10_000_000)
test2(10)
@time test2(10_000_000)
test3(g,10)
@time test3(g,10_000_000)
The output will be:
elapsed time: 0.38846745 seconds (320032108 bytes allocated, 35.61% gc time)
elapsed time: 1.259e-6 seconds (80 bytes allocated)
elapsed time: 0.814524146 seconds (319983728 bytes allocated, 17.09% gc time)
I think it's because the compiler cannot inline the function call to g in the
loop of test1, due to at next run of test1 f can have changed. this causes it
to do a function lookup at each loop step. I think this
https://github.com/JuliaLang/julia/pull/9642 will enhance this property at
least for test3.