Re: [julia-users] dispatch slowdown when iterating over array with abstract values

Yichao Yu Mon, 04 Apr 2016 14:40:25 -0700

On Apr 4, 2016 1:36 PM, "Cedric St-Jean" <[email protected]> wrote:
>
> I'm not a compiler dev, but here's how I understand it:
>


Sorry forgot to reply....

>
> On Sunday, April 3, 2016 at 6:32:17 PM UTC-4, Greg Plowman wrote:
>>
>> It seems to me that slowness of dynamic dispatch and type instability
are orthogonal.
>>
>> I mean is dynamic dispatch inherently slow, or is it slow because it
involves type instability?
>
>
> Type instability (uncertainty) leads to multiple-dispatch, and multiple
dispatch leads to more type instability (in general, but not always), as
the return type is harder to pin down.
>
>>
>> @code_warntype slow(features) suggests that the return type from
evaluate() is Float64, so I can't see any type instability.
>> If this is correct, then why is dynamic dispatch so much slower than
run-time if statements?
>

I didn't know we are doing this optimization. I think the actual limit for
type inference is how many method actually matches. If you add more subtype
of Feature with different implementation of the function, I think you'll
see type inference give up at some point (the limit used to be 4 iirc)

Currently, whenever type inference cannot determine a single function to
call, it fallback to a full dynamic dispatch at runtime. This is of course
not the best way to do it See
https://github.com/JuliaLang/julia/issues/10805 and
https://github.com/JuliaLang/julia/pull/11862.

>
> Each object contains an ID which is an integer (or a pointer - I don't
know, but it's the same thing).
>
> if isa(x,A)
>
> checks wheter x's ID is equal to A's ID. That's an integer comparison,
and very fast. In contrast, multiple-dispatch involves looking up the
method table for which function to call when x is an Int. I don't know how
it's implemented, but a dictionary look-up is likely, and that's much more
costly.

A "hidden" cost of the dynamic dispatch is that the result needs to be
boxed. Since that's the only way one can return an arbitrary type object in
C. There are thought on how this can be improved but they involve more
complexity and it is not particularly clear if this particular case is
worth optimizing for (i.e. if it will regress more important cases for
example).

>
>>
>>
>> For my own understanding, I was almost hoping that slow() was not
type-stable,
>
>
> slow can be type-stable if the compiler assumes that no other
subtype/method will be added, which it seems to be doing. There's still the
multiple-dispatch cost, since it doesn't know which method to call. To
demonstrate the point about type-stability, if I define
>
> type C <: Feature end
> evaluate(f::C) = 1:3
>
> at the same time as I defined A and B, then slow becomes twice as slow as
before (430 microseconds vs. 210), even though I haven't even instantiated
any C object. That's because the compiler no longer assumes that `evaluate`
returns a Float (check code_warntype) and thus needs a second
multiple-dispatch to make the + call.
>
>>
>> and annotating with evaluate(features[i])::Float64 would let the
compiler produce faster code,
>> but it makes no difference.
>>
>>
>> On Sunday, April 3, 2016 at 11:07:44 PM UTC+10, Cedric St-Jean wrote:
>>>
>>> Good call, it was already pointed out in that thread.
>>>
>>> On Sat, Apr 2, 2016 at 11:11 PM, Yichao Yu <[email protected]> wrote:
>>>>
>>>> On Sat, Apr 2, 2016 at 10:53 PM, Cedric St-Jean <[email protected]>
wrote:
>>>> > That's actually a compiler bug, nice!
>>>> >
>>>> > abstract Feature
>>>> >
>>>> > type A <: Feature end
>>>> > evaluate(f::A) = 1.0
>>>> >
>>>> > foo(features::Vector{Feature}) = isa(features[1], A) ?
>>>> > evaluate(features[1]::A) : evaluate(features[1])
>>>> >
>>>> > @show foo(Feature[A()])
>>>> >
>>>> > type C <: Feature end
>>>> > evaluate(f::C) = 100
>>>> >
>>>> > @show foo(Feature[C()])
>>>> >
>>>> > yields
>>>> >
>>>> > foo(Feature[A()]) = 1.0
>>>> >
>>>> > foo(Feature[C()]) = 4.94e-322
>>>> >
>>>> >
>>>> > That explains why performance was the same on your computer: the
compiler
>>>> > was making an incorrect assumption about the return type of
`evaluate`. Or
>>>> > maybe it's an intentional gamble by the Julia devs, for the sake of
>>>> > performance.
>>>> >
>>>> > I couldn't find any issue describing this. Yichao?
>>>>
>>>> This is effectively #265. It's not always predictable what assumption
>>>> the compiler makes now...
>>>>
>>>> >
>>>> >
>>>> > On Saturday, April 2, 2016 at 10:16:59 PM UTC-4, Greg Plowman wrote:
>>>> >>
>>>> >> Thanks Cedric and Yichao.
>>>> >>
>>>> >> This makes sense that there might be new subtypes and associated
>>>> >> specialised methods. I understand that now. Thanks.
>>>> >>
>>>> >> On my machine (v0.4.5 Windows), fast() and pretty_fast() seem to
run in
>>>> >> similar time.
>>>> >> So I looked as @code_warntype as Yichao suggested and get the
following.
>>>> >> I don't fully know how to interpret output, but return type from
the final
>>>> >> "catchall" evaluate() seems to be inferred/asserted as Float64 (see
>>>> >> highlighted yellow line below)
>>>> >>
>>>> >> Would this explain why pretty_fast() seems to be as efficient as
fast()?
>>>> >>
>>>> >> Why is the return type being inferred asserted as Float64?
>>>> >>
>>>> >>
>>>> >> julia> @code_warntype fast(features)
>>>> >> Variables:
>>>> >>   features::Array{Feature,1}
>>>> >>   retval::Float64
>>>> >>   #s1::Int64
>>>> >>   i::Int64
>>>> >>
>>>> >> Body:
>>>> >>   begin  # none, line 2:
>>>> >>       retval = 0.0 # none, line 3:
>>>> >>       GenSym(2) = (Base.arraylen)(features::Array{Feature,1})::Int64
>>>> >>       GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1,
>>>> >>
:(((top(getfield))(Base.Intrinsics,:select_value)::I)((Base.sle_int)(1,GenSym(2))::Bool,GenSym(2),(Base.box)
>>>> >> (Int64,(Base.sub_int)(1,1)))::Int64)))
>>>> >>       #s1 = (top(getfield))(GenSym(0),:start)::Int64
>>>> >>       unless (Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 ===
>>>> >>
(Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool))
>>>> >> goto 1
>>>> >>       2:
>>>> >>       GenSym(3) = #s1::Int64
>>>> >>       GenSym(4) = (Base.box)(Base.Int,(Base.add_int)(#s1::Int64,1))
>>>> >>       i = GenSym(3)
>>>> >>       #s1 = GenSym(4) # none, line 4:
>>>> >>       unless
>>>> >>
(Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.A)::Bool
>>>> >> goto 4 # none, line 5:
>>>> >>       retval =
>>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,1.0))
>>>> >>       goto 5
>>>> >>       4:  # none, line 7:
>>>> >>       retval =
>>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,0.0))
>>>> >>       5:
>>>> >>       3:
>>>> >>       unless
>>>> >>
(Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64
>>>> >> === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0)
>>>> >> ,:stop)::Int64,1))::Bool)))) goto 2
>>>> >>       1:
>>>> >>       0:  # none, line 10:
>>>> >>       return retval::Float64
>>>> >>   end::Float64
>>>> >>
>>>> >>
>>>> >> julia> @code_warntype pretty_fast(features)
>>>> >> Variables:
>>>> >>   features::Array{Feature,1}
>>>> >>   retval::Float64
>>>> >>   #s1::Int64
>>>> >>   i::Int64
>>>> >>
>>>> >> Body:
>>>> >>   begin  # none, line 2:
>>>> >>       retval = 0.0 # none, line 3:
>>>> >>       GenSym(2) = (Base.arraylen)(features::Array{Feature,1})::Int64
>>>> >>       GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1,
>>>> >>
:(((top(getfield))(Base.Intrinsics,:select_value)::I)((Base.sle_int)(1,GenSym(2))::Bool,GenSym(2),(Base.box)
>>>> >> (Int64,(Base.sub_int)(1,1)))::Int64)))
>>>> >>       #s1 = (top(getfield))(GenSym(0),:start)::Int64
>>>> >>       unless (Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 ===
>>>> >>
(Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool))
>>>> >> goto 1
>>>> >>       2:
>>>> >>       GenSym(4) = #s1::Int64
>>>> >>       GenSym(5) = (Base.box)(Base.Int,(Base.add_int)(#s1::Int64,1))
>>>> >>       i = GenSym(4)
>>>> >>       #s1 = GenSym(5) # none, line 4:
>>>> >>       unless
>>>> >>
(Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.A)::Bool
>>>> >> goto 4 # none, line 5:
>>>> >>       retval =
>>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,1.0))
>>>> >>       goto 6
>>>> >>       4:  # none, line 6:
>>>> >>       unless
>>>> >>
(Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.B)::Bool
>>>> >> goto 5 # none, line 7:
>>>> >>       retval =
>>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,0.0))
>>>> >>       goto 6
>>>> >>       5:  # none, line 9:
>>>> >>       GenSym(3) =
>>>> >>
(Main.evaluate)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature)::Float64
>>>> >>       retval =
>>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,GenSym(3)))
>>>> >>       6:
>>>> >>       3:
>>>> >>       unless
>>>> >>
(Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64
>>>> >> === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0)
>>>> >> ,:stop)::Int64,1))::Bool)))) goto 2
>>>> >>       1:
>>>> >>       0:  # none, line 12:
>>>> >>       return retval::Float64
>>>> >>   end::Float64
>>>> >>
>>>> >> Julia>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Sunday, April 3, 2016 at 8:04:35 AM UTC+10, Cedric St-Jean wrote:
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Saturday, April 2, 2016 at 5:39:45 PM UTC-4, Greg Plowman wrote:
>>>> >>>>
>>>> >>>> Cedric,
>>>> >>>> On my machine fast() and pretty_fast() run in the roughly the
same time.
>>>> >>>> Are you sure pre-compiled first?
>>>> >>>
>>>> >>>
>>>> >>> Yep. I'm using @benchmark on Julia 0.4.5 on OSX, and the time
difference
>>>> >>> is definitely > 2X.
>>>> >>>
>>>> >>>>
>>>> >>>>
>>>> >>>> 1. If you add a default fallback method, say,
evaluate(f::Feature) =
>>>> >>>> -1.0
>>>> >>>> Theoretically, would that inform the compiler that evaluate()
always
>>>> >>>> returns a Float64 and therefore is type stable?
>>>> >>>
>>>> >>>
>>>> >>> No, because I might write
>>>> >>>
>>>> >>> type C <: Feature
>>>> >>> end
>>>> >>> evaluate(f::C) = 100
>>>> >>>
>>>> >>> and a Vector{Feature} might contain a C. Abstract types are
open-ended,
>>>> >>> new types can be created a runtime, so the compiler can't assume
anything
>>>> >>> about them, which is why they're not useful for performance. See
here.
>>>> >>>
>>>> >>>
>>>> >>>>
>>>> >>>> Does dynamic dispatch mean compiler has to look up method at run
time
>>>> >>>> (because it can't know type ahead of time).
>>>> >>>> This is equivalent to explicitly coding "static dispatch" with if
>>>> >>>> statements?
>>>> >>>> If so then why is it so much slower?
>>>> >>>> Providing a fallback method as in 1. or explicitly returning
Float64
>>>> >>>> doesn't seem to improve speed, which naively suggests that
slowness is from
>>>> >>>> dynamic dispatch not from type instability???
>>>> >>>>
>>>> >>>> 3. Also, on my machine pretty_fast() is as fast as fast(). Why is
this
>>>> >>>> so, if pretty_fast() is supposedly type unstable?
>>>> >>>>
>>>> >>>> As you can probably tell, I'm pretty confused on this.
>>>> >>>>
>>>> >>>>
>>>> >>>> On Sunday, April 3, 2016 at 6:34:09 AM UTC+10, Cedric St-Jean
wrote:
>>>> >>>>>
>>>> >>>>> Thank you for the detailed explanation. I tried it out:
>>>> >>>>>
>>>> >>>>> function pretty_fast(features::Vector{Feature})
>>>> >>>>>     retval = 0.0
>>>> >>>>>     for i in 1 : length(features)
>>>> >>>>>         if isa(features[i], A)
>>>> >>>>>             x = evaluate(features[i]::A)
>>>> >>>>>         elseif isa(features[i], B)
>>>> >>>>>             x = evaluate(features[i]::B)
>>>> >>>>>         else
>>>> >>>>>             x = evaluate(features[i])
>>>> >>>>>         end
>>>> >>>>>         retval += x
>>>> >>>>>     end
>>>> >>>>>     retval
>>>> >>>>> end
>>>> >>>>>
>>>> >>>>> On my laptop, fast runs in 10 microseconds, pretty_fast in 30,
and slow
>>>> >>>>> in 210.
>>>> >>>>>
>>>> >>>>> On Saturday, April 2, 2016 at 12:24:18 PM UTC-4, Yichao Yu wrote:
>>>> >>>>>>
>>>> >>>>>> On Sat, Apr 2, 2016 at 12:16 PM, Tim Wheeler <
[email protected]>
>>>> >>>>>> wrote:
>>>> >>>>>> > Thank you for the comments. In my original code it means the
>>>> >>>>>> > difference
>>>> >>>>>> > between a 30 min execution with memory allocation in the
Gigabytes
>>>> >>>>>> > and a few
>>>> >>>>>> > seconds of execution with only 800 bytes using the second
version.
>>>> >>>>>> > I thought under-the-hood Julia basically runs those if
statements
>>>> >>>>>> > anyway for
>>>> >>>>>> > its dispatch, and don't know why it needs to allocate any
memory.
>>>> >>>>>> > Having the if-statement workaround will be fine though.
>>>> >>>>>>
>>>> >>>>>> Well, if you have a lot of these cheap functions being
dynamically
>>>> >>>>>> dispatched I think it is not a good way to use the type.
Depending on
>>>> >>>>>> your problem, you may be better off using a enum/flags/dict to
>>>> >>>>>> represent the type/get the values.
>>>> >>>>>>
>>>> >>>>>> The reason for the allocation is that the return type is
unknown. It
>>>> >>>>>> should be obvious to see if you check your code with
code_warntype.
>>>> >>>>>>
>>>> >>>>>> >
>>>> >>>>>> > On Saturday, April 2, 2016 at 7:26:11 AM UTC-7, Cedric St-Jean
>>>> >>>>>> > wrote:
>>>> >>>>>> >>
>>>> >>>>>> >>
>>>> >>>>>> >>> Therefore there's no way the compiler can rewrite the slow
version
>>>> >>>>>> >>> to the
>>>> >>>>>> >>> fast version.
>>>> >>>>>> >>
>>>> >>>>>> >>
>>>> >>>>>> >> It knows that the element type is a Feature, so it could
produce:
>>>> >>>>>> >>
>>>> >>>>>> >> if isa(features[i], A)
>>>> >>>>>> >>     retval += evaluate(features[i]::A)
>>>> >>>>>> >> elseif isa(features[i], B)
>>>> >>>>>> >>     retval += evaluate(features[i]::B)
>>>> >>>>>> >> else
>>>> >>>>>> >>     retval += evaluate(features[i])
>>>> >>>>>> >> end
>>>> >>>>>> >>
>>>> >>>>>> >> and it would make sense for abstract types that have few
subtypes.
>>>> >>>>>> >> I
>>>> >>>>>> >> didn't realize that dispatch was an order of magnitude
slower than
>>>> >>>>>> >> type
>>>> >>>>>> >> checking. It's easy enough to write a macro generating this
>>>> >>>>>> >> expansion, too.
>>>> >>>>>> >>
>>>> >>>>>> >> On Saturday, April 2, 2016 at 2:05:20 AM UTC-4, Yichao Yu
wrote:
>>>> >>>>>> >>>
>>>> >>>>>> >>> On Fri, Apr 1, 2016 at 9:56 PM, Tim Wheeler
>>>> >>>>>> >>> <[email protected]>
>>>> >>>>>> >>> wrote:
>>>> >>>>>> >>> > Hello Julia Users.
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > I ran into a weird slowdown issue and reproduced a minimal
>>>> >>>>>> >>> > working
>>>> >>>>>> >>> > example.
>>>> >>>>>> >>> > Maybe someone can help shed some light.
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > abstract Feature
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > type A <: Feature end
>>>> >>>>>> >>> > evaluate(f::A) = 1.0
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > type B <: Feature end
>>>> >>>>>> >>> > evaluate(f::B) = 0.0
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > function slow(features::Vector{Feature})
>>>> >>>>>> >>> >     retval = 0.0
>>>> >>>>>> >>> >     for i in 1 : length(features)
>>>> >>>>>> >>> >         retval += evaluate(features[i])
>>>> >>>>>> >>> >     end
>>>> >>>>>> >>> >     retval
>>>> >>>>>> >>> > end
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > function fast(features::Vector{Feature})
>>>> >>>>>> >>> >     retval = 0.0
>>>> >>>>>> >>> >     for i in 1 : length(features)
>>>> >>>>>> >>> >         if isa(features[i], A)
>>>> >>>>>> >>> >             retval += evaluate(features[i]::A)
>>>> >>>>>> >>> >         else
>>>> >>>>>> >>> >             retval += evaluate(features[i]::B)
>>>> >>>>>> >>> >         end
>>>> >>>>>> >>> >     end
>>>> >>>>>> >>> >     retval
>>>> >>>>>> >>> > end
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > using ProfileView
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > features = Feature[]
>>>> >>>>>> >>> > for i in 1 : 10000
>>>> >>>>>> >>> >     push!(features, A())
>>>> >>>>>> >>> > end
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > slow(features)
>>>> >>>>>> >>> > @time slow(features)
>>>> >>>>>> >>> > fast(features)
>>>> >>>>>> >>> > @time fast(features)
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > The output is:
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > 0.000136 seconds (10.15 k allocations: 166.417 KB)
>>>> >>>>>> >>> > 0.000012 seconds (5 allocations: 176 bytes)
>>>> >>>>>> >>> >
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > This is a HUGE difference! Am I missing something big? Is
there
>>>> >>>>>> >>> > a good
>>>> >>>>>> >>> > way
>>>> >>>>>> >>> > to inspect code to figure out where I am going wrong?
>>>> >>>>>> >>>
>>>> >>>>>> >>> This is because of type instability as you will find in the
>>>> >>>>>> >>> performance
>>>> >>>>>> >>> tips.
>>>> >>>>>> >>> Note that slow and fast are not equivalent since the fast
version
>>>> >>>>>> >>> only
>>>> >>>>>> >>> accept `A` or `B` but the slow version accepts any subtype
of
>>>> >>>>>> >>> feature
>>>> >>>>>> >>> that you may ever define. Therefore there's no way the
compiler
>>>> >>>>>> >>> can
>>>> >>>>>> >>> rewrite the slow version to the fast version.
>>>> >>>>>> >>> There are optimizations that can be applied to bring down
the gap
>>>> >>>>>> >>> but
>>>> >>>>>> >>> there'll always be a large difference between the two.
>>>> >>>>>> >>>
>>>> >>>>>> >>> >
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > Thank you in advance for any guidance.
>>>> >>>>>> >>> >
>>>> >>>>>> >>> >
>>>> >>>>>> >>> > -Tim
>>>> >>>>>> >>> >
>>>> >>>>>> >>> >
>>>> >>>>>> >>> >
>>>> >>>>>> >>> >
>>>> >>>>>> >>> >
>>>
>>>

Re: [julia-users] dispatch slowdown when iterating over array with abstract values

Reply via email to