norm() is even faster than maximum() !
On Wednesday, November 18, 2015 at 8:49:59 PM UTC+1, Tim Holy wrote: > > Note also that: > > function mynorm(x) > s = zero(x[1]^2) > @inbounds @simd for I in eachindex(x) > s += x[I]^2 > end > sqrt(s) > end > > does get SIMDed. So the difference is almost surely vectorization. > > --Tim > > > On Wednesday, November 18, 2015 01:38:38 PM Tim Holy wrote: > > On Wednesday, November 18, 2015 02:30:01 PM Stefan Karpinski wrote: > > > Those numbers don't include any compilation (the allocations are too > low). > > > I'm seeing a similar thing. They're just implemented in really > different > > > ways. maxabs uses mapreduce, which seems to be a chronic source of > > > less-than-optimal performance. > > > > Not the problem: > > > > julia> function mymaxabs(x) > > s = abs(x[1]) > > @inbounds @simd for I in eachindex(x) > > s = max(s, abs(x[I])) > > end > > s > > end > > mymaxabs (generic function with 1 method) > > > > julia> x = randn(100000); > > > > # warmup suppressed > > > > julia> @time maxabs(x) > > 0.000425 seconds (5 allocations: 176 bytes) > > 4.513240114499124 > > > > julia> @time mymaxabs(x) > > 0.000642 seconds (5 allocations: 176 bytes) > > 4.513240114499124 > > > > > > (It doesn't actually get SIMDed, though.) > > > > I'm not entirely surprised. Multiplication is fast, and with 10^5 > elements > > the sqrt should not be the bottleneck. > > > > --Tim > > > > > On Wed, Nov 18, 2015 at 2:12 PM, Benjamin Deonovic <[email protected] > <javascript:>> > > > > > > wrote: > > > > Does norm use maxabs? If so this could be due to maxabs getting > > > > compiled. > > > > try running both of the timed statements a second time. > > > > > > > > On Wednesday, November 18, 2015 at 10:41:48 AM UTC-6, Sisyphuss > wrote: > > > >> Interesting phenomenon: norm() is faster than maxabs() > > > >> > > > >> x = randn(100000) > > > >> @time maxabs(x) > > > >> @time norm(x) > > > >> > > > >> > > > >> 0.000108 seconds (5 allocations: 176 bytes) > > > >> 0.000040 seconds (5 allocations: 176 bytes) > > > >> > > > >> I have thought the contrary, for norm() requires N square and 1 > square > > > >> root; maxabs() requires 2N change of sign bit and N comparison. > >
