norm() is even faster than maximum()  !

On Wednesday, November 18, 2015 at 8:49:59 PM UTC+1, Tim Holy wrote:
>
> Note also that: 
>
>             function mynorm(x) 
>                   s = zero(x[1]^2) 
>                   @inbounds @simd for I in eachindex(x) 
>                       s += x[I]^2 
>                   end 
>                   sqrt(s) 
>               end 
>
> does get SIMDed. So the difference is almost surely vectorization. 
>
> --Tim 
>
>
> On Wednesday, November 18, 2015 01:38:38 PM Tim Holy wrote: 
> > On Wednesday, November 18, 2015 02:30:01 PM Stefan Karpinski wrote: 
> > > Those numbers don't include any compilation (the allocations are too 
> low). 
> > > I'm seeing a similar thing. They're just implemented in really 
> different 
> > > ways. maxabs uses mapreduce, which seems to be a chronic source of 
> > > less-than-optimal performance. 
> > 
> > Not the problem: 
> > 
> > julia> function mymaxabs(x) 
> >            s = abs(x[1]) 
> >            @inbounds @simd for I in eachindex(x) 
> >                s = max(s, abs(x[I])) 
> >            end 
> >            s 
> >        end 
> > mymaxabs (generic function with 1 method) 
> > 
> > julia> x = randn(100000); 
> > 
> > # warmup suppressed 
> > 
> > julia> @time maxabs(x) 
> >   0.000425 seconds (5 allocations: 176 bytes) 
> > 4.513240114499124 
> > 
> > julia> @time mymaxabs(x) 
> >   0.000642 seconds (5 allocations: 176 bytes) 
> > 4.513240114499124 
> > 
> > 
> > (It doesn't actually get SIMDed, though.) 
> > 
> > I'm not entirely surprised. Multiplication is fast, and with 10^5 
> elements 
> > the sqrt should not be the bottleneck. 
> > 
> > --Tim 
> > 
> > > On Wed, Nov 18, 2015 at 2:12 PM, Benjamin Deonovic <[email protected] 
> <javascript:>> 
> > > 
> > > wrote: 
> > > > Does norm use maxabs? If so this could be due to maxabs getting 
> > > > compiled. 
> > > > try running both of the timed statements a second time. 
> > > > 
> > > > On Wednesday, November 18, 2015 at 10:41:48 AM UTC-6, Sisyphuss 
> wrote: 
> > > >> Interesting phenomenon: norm() is faster than maxabs() 
> > > >> 
> > > >> x = randn(100000) 
> > > >> @time maxabs(x) 
> > > >> @time norm(x) 
> > > >> 
> > > >> 
> > > >> 0.000108 seconds (5 allocations: 176 bytes) 
> > > >> 0.000040 seconds (5 allocations: 176 bytes) 
> > > >> 
> > > >> I have thought the contrary, for norm() requires N square and 1 
> square 
> > > >> root; maxabs() requires 2N change of sign bit and N comparison. 
>
>

Reply via email to