I have a small test-case which is slow when benchmarked from a loop, but
fast when benchmarked from a function:
using Images
function expensive(img)
img[1, 2] * img[3, 4] + img[5, 6] - img[7, 8]
end
function benchmark(img)
for i in 1:1000000
expensive(img)
end
end
function main()
img = Image(float32(randn(10, 10)))
# this is fast
gc_disable()
@time benchmark(img)
gc_enable()
# this is slow
gc_disable()
@time for i in 1:1000000
expensive(img)
end
gc_enable()
end
main()
As per Tim Holy (Images.jl issue #74) this is because Julia can't inline
the getindex call when expensive() is being called from a loop rather than
from a function. Why is that though? Isn't img a local variable, with a
known type, which should result in a fully type-inferred version of
expensive()? Why is it specialized even more when it is called "from one
level deeper"?
Oddly enough, making "img" a global improves performance of the slow case
by about 50%, and doesn't alter the fast case... Now I'm confused.
Best,
Tim