Note also that:
function mynorm(x)
s = zero(x[1]^2)
@inbounds @simd for I in eachindex(x)
s += x[I]^2
end
sqrt(s)
end
does get SIMDed. So the difference is almost surely vectorization.
--Tim
On Wednesday, November 18, 2015 01:38:38 PM Tim Holy wrote:
> On Wednesday, November 18, 2015 02:30:01 PM Stefan Karpinski wrote:
> > Those numbers don't include any compilation (the allocations are too low).
> > I'm seeing a similar thing. They're just implemented in really different
> > ways. maxabs uses mapreduce, which seems to be a chronic source of
> > less-than-optimal performance.
>
> Not the problem:
>
> julia> function mymaxabs(x)
> s = abs(x[1])
> @inbounds @simd for I in eachindex(x)
> s = max(s, abs(x[I]))
> end
> s
> end
> mymaxabs (generic function with 1 method)
>
> julia> x = randn(100000);
>
> # warmup suppressed
>
> julia> @time maxabs(x)
> 0.000425 seconds (5 allocations: 176 bytes)
> 4.513240114499124
>
> julia> @time mymaxabs(x)
> 0.000642 seconds (5 allocations: 176 bytes)
> 4.513240114499124
>
>
> (It doesn't actually get SIMDed, though.)
>
> I'm not entirely surprised. Multiplication is fast, and with 10^5 elements
> the sqrt should not be the bottleneck.
>
> --Tim
>
> > On Wed, Nov 18, 2015 at 2:12 PM, Benjamin Deonovic <[email protected]>
> >
> > wrote:
> > > Does norm use maxabs? If so this could be due to maxabs getting
> > > compiled.
> > > try running both of the timed statements a second time.
> > >
> > > On Wednesday, November 18, 2015 at 10:41:48 AM UTC-6, Sisyphuss wrote:
> > >> Interesting phenomenon: norm() is faster than maxabs()
> > >>
> > >> x = randn(100000)
> > >> @time maxabs(x)
> > >> @time norm(x)
> > >>
> > >>
> > >> 0.000108 seconds (5 allocations: 176 bytes)
> > >> 0.000040 seconds (5 allocations: 176 bytes)
> > >>
> > >> I have thought the contrary, for norm() requires N square and 1 square
> > >> root; maxabs() requires 2N change of sign bit and N comparison.