It could be that integer powers are done with binary shifts in software and the floating point powers are computed in the fpu.
On Thursday, September 24, 2015 at 3:47:22 PM UTC-4, Kristoffer Carlsson wrote: > > I think Tom iir right here. These lines call the pow function > > movabsq $pow, %rax > callq *%rax > > but the actual pow functions that is being called is different. I am > surprised it is that much of a difference in performance between the two > pow functions... That seems odd. > > What Mauro says is also interesting that the speed difference is there > (and is as large) even without the fastmath macro. > > My question now is, what does IEEE say about x^double vs x^int. Is there > any reason these should have different performance? If not, it seems to > make sense to always convert the exponent to a double and call the libm > version? All doubles should be able to exactly represent the integers that > the power function take? > > > On Thursday, September 24, 2015 at 9:18:45 PM UTC+2, Mauro wrote: >> >> I dissected the bench-method into two, just to be sure (on 0.4-RC2). >> >> julia> function bench(N) >> for i = 1:N >> f(π/4) >> end >> end >> bench (generic function with 1 method) >> >> julia> function bench_f(N) >> for i = 1:N >> f_float(π/4) >> end >> end >> bench_f (generic function with 1 method) >> >> They also have identical native code but run differently: >> >> julia> @time bench_f(10^7) >> 0.190613 seconds (5 allocations: 176 bytes) >> >> julia> @time bench(10^7) >> 0.780212 seconds (5 allocations: 176 bytes) >> >> I thought that @code_native shows the code which is actually run, so why >> different speeds? >> >> If I define the f* functions without the @fastmath macro, then I get >> the same performance as above: >> >> julia> @time bench_f(10^7) >> 0.203071 seconds (5 allocations: 176 bytes) >> >> julia> @time bench(10^7) >> 0.787696 seconds (5 allocations: 176 bytes) >> >> but with different native-codes. >> >> > I can reproduce... I think the 2 versions will call these methods >> > respectively... I guess there's a performance difference? >> > >> > pow_fast{T<:FloatTypes}(x::T, y::Integer) = >> >> box(T, Base.powi_llvm(unbox(T,x), unbox(Int32,Int32(y)))) >> >> >> > >> > >> >> pow_fast(x::Float64, y::Float64) = >> >> ccall(("pow",libm), Float64, (Float64,Float64), x, y) >> > >> >> Tom, or are those two functions called within the native-code? I'm no >> good assembler reader. >> >