That makes sense – thanks for the thorough analysis, Tim.

On Wed, Nov 18, 2015 at 2:49 PM, Tim Holy <[email protected]> wrote:

> Note also that:
>
>             function mynorm(x)
>                   s = zero(x[1]^2)
>                   @inbounds @simd for I in eachindex(x)
>                       s += x[I]^2
>                   end
>                   sqrt(s)
>               end
>
> does get SIMDed. So the difference is almost surely vectorization.
>
> --Tim
>
>
> On Wednesday, November 18, 2015 01:38:38 PM Tim Holy wrote:
> > On Wednesday, November 18, 2015 02:30:01 PM Stefan Karpinski wrote:
> > > Those numbers don't include any compilation (the allocations are too
> low).
> > > I'm seeing a similar thing. They're just implemented in really
> different
> > > ways. maxabs uses mapreduce, which seems to be a chronic source of
> > > less-than-optimal performance.
> >
> > Not the problem:
> >
> > julia> function mymaxabs(x)
> >            s = abs(x[1])
> >            @inbounds @simd for I in eachindex(x)
> >                s = max(s, abs(x[I]))
> >            end
> >            s
> >        end
> > mymaxabs (generic function with 1 method)
> >
> > julia> x = randn(100000);
> >
> > # warmup suppressed
> >
> > julia> @time maxabs(x)
> >   0.000425 seconds (5 allocations: 176 bytes)
> > 4.513240114499124
> >
> > julia> @time mymaxabs(x)
> >   0.000642 seconds (5 allocations: 176 bytes)
> > 4.513240114499124
> >
> >
> > (It doesn't actually get SIMDed, though.)
> >
> > I'm not entirely surprised. Multiplication is fast, and with 10^5
> elements
> > the sqrt should not be the bottleneck.
> >
> > --Tim
> >
> > > On Wed, Nov 18, 2015 at 2:12 PM, Benjamin Deonovic <
> [email protected]>
> > >
> > > wrote:
> > > > Does norm use maxabs? If so this could be due to maxabs getting
> > > > compiled.
> > > > try running both of the timed statements a second time.
> > > >
> > > > On Wednesday, November 18, 2015 at 10:41:48 AM UTC-6, Sisyphuss
> wrote:
> > > >> Interesting phenomenon: norm() is faster than maxabs()
> > > >>
> > > >> x = randn(100000)
> > > >> @time maxabs(x)
> > > >> @time norm(x)
> > > >>
> > > >>
> > > >> 0.000108 seconds (5 allocations: 176 bytes)
> > > >> 0.000040 seconds (5 allocations: 176 bytes)
> > > >>
> > > >> I have thought the contrary, for norm() requires N square and 1
> square
> > > >> root; maxabs() requires 2N change of sign bit and N comparison.
>
>

Reply via email to