Re: [julia-users] Re: Same native code, different performance

Jeffrey Sarnoff Thu, 24 Sep 2015 13:06:07 -0700

It could be that integer powers are done with binary shifts in software and 
the floating point powers are computed in the fpu.


On Thursday, September 24, 2015 at 3:47:22 PM UTC-4, Kristoffer Carlsson 
wrote:
>
> I think Tom iir right here. These lines call the pow function
>
>     movabsq $pow, %rax
>     callq   *%rax
>
> but the actual pow functions that is being called is different. I am 
> surprised it is that much of a difference in performance between the two 
> pow functions... That seems odd.
>
> What Mauro says is also interesting that the speed difference is there 
> (and is as large) even without the fastmath macro.
>
> My question now is, what does IEEE say about x^double vs x^int. Is there 
> any reason these should have different performance? If not, it seems to 
> make sense to always convert the exponent to a double and call the libm 
> version? All doubles should be able to exactly represent the integers that 
> the power function take?
>
>
> On Thursday, September 24, 2015 at 9:18:45 PM UTC+2, Mauro wrote:
>>
>> I dissected the bench-method into two, just to be sure (on 0.4-RC2). 
>>
>> julia> function bench(N) 
>>           for i = 1:N 
>>                f(π/4) 
>>           end 
>>        end 
>> bench (generic function with 1 method) 
>>
>> julia> function bench_f(N) 
>>           for i = 1:N 
>>                f_float(π/4) 
>>           end 
>>        end 
>> bench_f (generic function with 1 method) 
>>
>> They also have identical native code but run differently: 
>>
>> julia> @time bench_f(10^7) 
>>   0.190613 seconds (5 allocations: 176 bytes) 
>>
>> julia> @time bench(10^7) 
>>   0.780212 seconds (5 allocations: 176 bytes) 
>>
>> I thought that @code_native shows the code which is actually run, so why 
>> different speeds? 
>>
>> If I define the f* functions without the @fastmath macro, then I get 
>> the same performance as above: 
>>
>> julia> @time bench_f(10^7) 
>>   0.203071 seconds (5 allocations: 176 bytes) 
>>
>> julia> @time bench(10^7) 
>>   0.787696 seconds (5 allocations: 176 bytes) 
>>
>> but with different native-codes. 
>>
>> > I can reproduce... I think the 2 versions will call these methods 
>> > respectively... I guess there's a performance difference? 
>> > 
>> > pow_fast{T<:FloatTypes}(x::T, y::Integer) = 
>> >>     box(T, Base.powi_llvm(unbox(T,x), unbox(Int32,Int32(y)))) 
>> >> 
>> > 
>> > 
>> >> pow_fast(x::Float64, y::Float64) = 
>> >>     ccall(("pow",libm), Float64, (Float64,Float64), x, y) 
>> > 
>>
>> Tom, or are those two functions called within the native-code?  I'm no 
>> good assembler reader. 
>>
>

Re: [julia-users] Re: Same native code, different performance

Reply via email to