Re: [julia-users] dispatch slowdown when iterating over array with abstract values

Cedric St-Jean Mon, 04 Apr 2016 10:38:03 -0700

I'm not a compiler dev, but here's how I understand it:

On Sunday, April 3, 2016 at 6:32:17 PM UTC-4, Greg Plowman wrote:
>
> It seems to me that slowness of dynamic dispatch and type instability are 
> orthogonal.
>
I mean is dynamic dispatch inherently slow, or is it slow because it 
> involves type instability?
>


Type instability (uncertainty) leads to multiple-dispatch, and multiple 
dispatch leads to more type instability (in general, but not always), as 
the return type is harder to pin down. 
 

> @code_warntype slow(features) suggests that the return type from 
> evaluate() is Float64, so I can't see any type instability.
> If this is correct, then why is dynamic dispatch so much slower than 
> run-time if statements?
>

Each object contains an ID which is an integer (or a pointer - I don't 
know, but it's the same thing).

if isa(x,A)

checks wheter x's ID is equal to A's ID. That's an integer comparison, and 
very fast. In contrast, multiple-dispatch involves looking up the method 
table for which function to call when x is an Int. I don't know how it's 
implemented, but a dictionary look-up is likely, and that's much more 
costly.
 

>
> For my own understanding, I was almost hoping that slow() was not 
> type-stable,
>

slow can be type-stable if the compiler assumes that no other 
subtype/method will be added, which it seems to be doing. There's still the 
multiple-dispatch cost, since it doesn't know which method to call. To 
demonstrate the point about type-stability, if I define 

type C <: Feature end
evaluate(f::C) = 1:3

*at the same time as I defined A and B*, then slow becomes twice as slow as 
before (430 microseconds vs. 210), even though I haven't even instantiated 
any C object. That's because the compiler no longer assumes that `evaluate` 
returns a Float (check code_warntype) and thus needs a second 
multiple-dispatch to make the + call.
 

> and annotating with evaluate(features[i])::Float64 would let the compiler 
> produce faster code,
> but it makes no difference.
>
>
> On Sunday, April 3, 2016 at 11:07:44 PM UTC+10, Cedric St-Jean wrote:
>
>> Good call, it was already pointed ou 
>> <https://github.com/JuliaLang/julia/issues/265#issuecomment-42047587>t 
>> in that thread.
>>
>> On Sat, Apr 2, 2016 at 11:11 PM, Yichao Yu <[email protected]> wrote:
>>
>>> On Sat, Apr 2, 2016 at 10:53 PM, Cedric St-Jean <[email protected]> 
>>> wrote:
>>> > That's actually a compiler bug, nice!
>>> >
>>> > abstract Feature
>>> >
>>> > type A <: Feature end
>>> > evaluate(f::A) = 1.0
>>> >
>>> > foo(features::Vector{Feature}) = isa(features[1], A) ?
>>> > evaluate(features[1]::A) : evaluate(features[1])
>>> >
>>> > @show foo(Feature[A()])
>>> >
>>> > type C <: Feature end
>>> > evaluate(f::C) = 100
>>> >
>>> > @show foo(Feature[C()])
>>> >
>>> > yields
>>> >
>>> > foo(Feature[A()]) = 1.0
>>> >
>>> > foo(Feature[C()]) = 4.94e-322
>>> >
>>> >
>>> > That explains why performance was the same on your computer: the 
>>> compiler
>>> > was making an incorrect assumption about the return type of 
>>> `evaluate`. Or
>>> > maybe it's an intentional gamble by the Julia devs, for the sake of
>>> > performance.
>>> >
>>> > I couldn't find any issue describing this. Yichao?
>>>
>>> This is effectively #265. It's not always predictable what assumption
>>> the compiler makes now...
>>>
>>> >
>>> >
>>> > On Saturday, April 2, 2016 at 10:16:59 PM UTC-4, Greg Plowman wrote:
>>> >>
>>> >> Thanks Cedric and Yichao.
>>> >>
>>> >> This makes sense that there might be new subtypes and associated
>>> >> specialised methods. I understand that now. Thanks.
>>> >>
>>> >> On my machine (v0.4.5 Windows), fast() and pretty_fast() seem to run 
>>> in
>>> >> similar time.
>>> >> So I looked as @code_warntype as Yichao suggested and get the 
>>> following.
>>> >> I don't fully know how to interpret output, but return type from the 
>>> final
>>> >> "catchall" evaluate() seems to be inferred/asserted as Float64 (see
>>> >> highlighted yellow line below)
>>> >>
>>> >> Would this explain why pretty_fast() seems to be as efficient as 
>>> fast()?
>>> >>
>>> >> Why is the return type being inferred asserted as Float64?
>>> >>
>>> >>
>>> >> julia> @code_warntype fast(features)
>>> >> Variables:
>>> >>   features::Array{Feature,1}
>>> >>   retval::Float64
>>> >>   #s1::Int64
>>> >>   i::Int64
>>> >>
>>> >> Body:
>>> >>   begin  # none, line 2:
>>> >>       retval = 0.0 # none, line 3:
>>> >>       GenSym(2) = (Base.arraylen)(features::Array{Feature,1})::Int64
>>> >>       GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1,
>>> >> 
>>> :(((top(getfield))(Base.Intrinsics,:select_value)::I)((Base.sle_int)(1,GenSym(2))::Bool,GenSym(2),(Base.box)
>>> >> (Int64,(Base.sub_int)(1,1)))::Int64)))
>>> >>       #s1 = (top(getfield))(GenSym(0),:start)::Int64
>>> >>       unless (Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 ===
>>> >> 
>>> (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool))
>>> >> goto 1
>>> >>       2:
>>> >>       GenSym(3) = #s1::Int64
>>> >>       GenSym(4) = (Base.box)(Base.Int,(Base.add_int)(#s1::Int64,1))
>>> >>       i = GenSym(3)
>>> >>       #s1 = GenSym(4) # none, line 4:
>>> >>       unless
>>> >> 
>>> (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.A)::Bool
>>> >> goto 4 # none, line 5:
>>> >>       retval =
>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,1.0))
>>> >>       goto 5
>>> >>       4:  # none, line 7:
>>> >>       retval =
>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,0.0))
>>> >>       5:
>>> >>       3:
>>> >>       unless
>>> >> 
>>> (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64
>>> >> === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0)
>>> >> ,:stop)::Int64,1))::Bool)))) goto 2
>>> >>       1:
>>> >>       0:  # none, line 10:
>>> >>       return retval::Float64
>>> >>   end::Float64
>>> >>
>>> >>
>>> >> julia> @code_warntype pretty_fast(features)
>>> >> Variables:
>>> >>   features::Array{Feature,1}
>>> >>   retval::Float64
>>> >>   #s1::Int64
>>> >>   i::Int64
>>> >>
>>> >> Body:
>>> >>   begin  # none, line 2:
>>> >>       retval = 0.0 # none, line 3:
>>> >>       GenSym(2) = (Base.arraylen)(features::Array{Feature,1})::Int64
>>> >>       GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1,
>>> >> 
>>> :(((top(getfield))(Base.Intrinsics,:select_value)::I)((Base.sle_int)(1,GenSym(2))::Bool,GenSym(2),(Base.box)
>>> >> (Int64,(Base.sub_int)(1,1)))::Int64)))
>>> >>       #s1 = (top(getfield))(GenSym(0),:start)::Int64
>>> >>       unless (Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 ===
>>> >> 
>>> (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool))
>>> >> goto 1
>>> >>       2:
>>> >>       GenSym(4) = #s1::Int64
>>> >>       GenSym(5) = (Base.box)(Base.Int,(Base.add_int)(#s1::Int64,1))
>>> >>       i = GenSym(4)
>>> >>       #s1 = GenSym(5) # none, line 4:
>>> >>       unless
>>> >> 
>>> (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.A)::Bool
>>> >> goto 4 # none, line 5:
>>> >>       retval =
>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,1.0))
>>> >>       goto 6
>>> >>       4:  # none, line 6:
>>> >>       unless
>>> >> 
>>> (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.B)::Bool
>>> >> goto 5 # none, line 7:
>>> >>       retval =
>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,0.0))
>>> >>       goto 6
>>> >>       5:  # none, line 9:
>>> >>       GenSym(3) =
>>> >> 
>>> (Main.evaluate)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature)::Float64
>>> >>       retval =
>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,GenSym(3)))
>>> >>       6:
>>> >>       3:
>>> >>       unless
>>> >> 
>>> (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64
>>> >> === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0)
>>> >> ,:stop)::Int64,1))::Bool)))) goto 2
>>> >>       1:
>>> >>       0:  # none, line 12:
>>> >>       return retval::Float64
>>> >>   end::Float64
>>> >>
>>> >> Julia>
>>> >>
>>> >>
>>> >>
>>> >> On Sunday, April 3, 2016 at 8:04:35 AM UTC+10, Cedric St-Jean wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Saturday, April 2, 2016 at 5:39:45 PM UTC-4, Greg Plowman wrote:
>>> >>>>
>>> >>>> Cedric,
>>> >>>> On my machine fast() and pretty_fast() run in the roughly the same 
>>> time.
>>> >>>> Are you sure pre-compiled first?
>>> >>>
>>> >>>
>>> >>> Yep. I'm using @benchmark on Julia 0.4.5 on OSX, and the time 
>>> difference
>>> >>> is definitely > 2X.
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>> 1. If you add a default fallback method, say, evaluate(f::Feature) =
>>> >>>> -1.0
>>> >>>> Theoretically, would that inform the compiler that evaluate() always
>>> >>>> returns a Float64 and therefore is type stable?
>>> >>>
>>> >>>
>>> >>> No, because I might write
>>> >>>
>>> >>> type C <: Feature
>>> >>> end
>>> >>> evaluate(f::C) = 100
>>> >>>
>>> >>> and a Vector{Feature} might contain a C. Abstract types are 
>>> open-ended,
>>> >>> new types can be created a runtime, so the compiler can't assume 
>>> anything
>>> >>> about them, which is why they're not useful for performance. See 
>>> here.
>>> >>>
>>> >>>
>>> >>>>
>>> >>>> Does dynamic dispatch mean compiler has to look up method at run 
>>> time
>>> >>>> (because it can't know type ahead of time).
>>> >>>> This is equivalent to explicitly coding "static dispatch" with if
>>> >>>> statements?
>>> >>>> If so then why is it so much slower?
>>> >>>> Providing a fallback method as in 1. or explicitly returning Float64
>>> >>>> doesn't seem to improve speed, which naively suggests that slowness 
>>> is from
>>> >>>> dynamic dispatch not from type instability???
>>> >>>>
>>> >>>> 3. Also, on my machine pretty_fast() is as fast as fast(). Why is 
>>> this
>>> >>>> so, if pretty_fast() is supposedly type unstable?
>>> >>>>
>>> >>>> As you can probably tell, I'm pretty confused on this.
>>> >>>>
>>> >>>>
>>> >>>> On Sunday, April 3, 2016 at 6:34:09 AM UTC+10, Cedric St-Jean wrote:
>>> >>>>>
>>> >>>>> Thank you for the detailed explanation. I tried it out:
>>> >>>>>
>>> >>>>> function pretty_fast(features::Vector{Feature})
>>> >>>>>     retval = 0.0
>>> >>>>>     for i in 1 : length(features)
>>> >>>>>         if isa(features[i], A)
>>> >>>>>             x = evaluate(features[i]::A)
>>> >>>>>         elseif isa(features[i], B)
>>> >>>>>             x = evaluate(features[i]::B)
>>> >>>>>         else
>>> >>>>>             x = evaluate(features[i])
>>> >>>>>         end
>>> >>>>>         retval += x
>>> >>>>>     end
>>> >>>>>     retval
>>> >>>>> end
>>> >>>>>
>>> >>>>> On my laptop, fast runs in 10 microseconds, pretty_fast in 30, and 
>>> slow
>>> >>>>> in 210.
>>> >>>>>
>>> >>>>> On Saturday, April 2, 2016 at 12:24:18 PM UTC-4, Yichao Yu wrote:
>>> >>>>>>
>>> >>>>>> On Sat, Apr 2, 2016 at 12:16 PM, Tim Wheeler <
>>> [email protected]>
>>> >>>>>> wrote:
>>> >>>>>> > Thank you for the comments. In my original code it means the
>>> >>>>>> > difference
>>> >>>>>> > between a 30 min execution with memory allocation in the 
>>> Gigabytes
>>> >>>>>> > and a few
>>> >>>>>> > seconds of execution with only 800 bytes using the second 
>>> version.
>>> >>>>>> > I thought under-the-hood Julia basically runs those if 
>>> statements
>>> >>>>>> > anyway for
>>> >>>>>> > its dispatch, and don't know why it needs to allocate any 
>>> memory.
>>> >>>>>> > Having the if-statement workaround will be fine though.
>>> >>>>>>
>>> >>>>>> Well, if you have a lot of these cheap functions being dynamically
>>> >>>>>> dispatched I think it is not a good way to use the type. 
>>> Depending on
>>> >>>>>> your problem, you may be better off using a enum/flags/dict to
>>> >>>>>> represent the type/get the values.
>>> >>>>>>
>>> >>>>>> The reason for the allocation is that the return type is unknown. 
>>> It
>>> >>>>>> should be obvious to see if you check your code with 
>>> code_warntype.
>>> >>>>>>
>>> >>>>>> >
>>> >>>>>> > On Saturday, April 2, 2016 at 7:26:11 AM UTC-7, Cedric St-Jean
>>> >>>>>> > wrote:
>>> >>>>>> >>
>>> >>>>>> >>
>>> >>>>>> >>> Therefore there's no way the compiler can rewrite the slow 
>>> version
>>> >>>>>> >>> to the
>>> >>>>>> >>> fast version.
>>> >>>>>> >>
>>> >>>>>> >>
>>> >>>>>> >> It knows that the element type is a Feature, so it could 
>>> produce:
>>> >>>>>> >>
>>> >>>>>> >> if isa(features[i], A)
>>> >>>>>> >>     retval += evaluate(features[i]::A)
>>> >>>>>> >> elseif isa(features[i], B)
>>> >>>>>> >>     retval += evaluate(features[i]::B)
>>> >>>>>> >> else
>>> >>>>>> >>     retval += evaluate(features[i])
>>> >>>>>> >> end
>>> >>>>>> >>
>>> >>>>>> >> and it would make sense for abstract types that have few 
>>> subtypes.
>>> >>>>>> >> I
>>> >>>>>> >> didn't realize that dispatch was an order of magnitude slower 
>>> than
>>> >>>>>> >> type
>>> >>>>>> >> checking. It's easy enough to write a macro generating this
>>> >>>>>> >> expansion, too.
>>> >>>>>> >>
>>> >>>>>> >> On Saturday, April 2, 2016 at 2:05:20 AM UTC-4, Yichao Yu 
>>> wrote:
>>> >>>>>> >>>
>>> >>>>>> >>> On Fri, Apr 1, 2016 at 9:56 PM, Tim Wheeler
>>> >>>>>> >>> <[email protected]>
>>> >>>>>> >>> wrote:
>>> >>>>>> >>> > Hello Julia Users.
>>> >>>>>> >>> >
>>> >>>>>> >>> > I ran into a weird slowdown issue and reproduced a minimal
>>> >>>>>> >>> > working
>>> >>>>>> >>> > example.
>>> >>>>>> >>> > Maybe someone can help shed some light.
>>> >>>>>> >>> >
>>> >>>>>> >>> > abstract Feature
>>> >>>>>> >>> >
>>> >>>>>> >>> > type A <: Feature end
>>> >>>>>> >>> > evaluate(f::A) = 1.0
>>> >>>>>> >>> >
>>> >>>>>> >>> > type B <: Feature end
>>> >>>>>> >>> > evaluate(f::B) = 0.0
>>> >>>>>> >>> >
>>> >>>>>> >>> > function slow(features::Vector{Feature})
>>> >>>>>> >>> >     retval = 0.0
>>> >>>>>> >>> >     for i in 1 : length(features)
>>> >>>>>> >>> >         retval += evaluate(features[i])
>>> >>>>>> >>> >     end
>>> >>>>>> >>> >     retval
>>> >>>>>> >>> > end
>>> >>>>>> >>> >
>>> >>>>>> >>> > function fast(features::Vector{Feature})
>>> >>>>>> >>> >     retval = 0.0
>>> >>>>>> >>> >     for i in 1 : length(features)
>>> >>>>>> >>> >         if isa(features[i], A)
>>> >>>>>> >>> >             retval += evaluate(features[i]::A)
>>> >>>>>> >>> >         else
>>> >>>>>> >>> >             retval += evaluate(features[i]::B)
>>> >>>>>> >>> >         end
>>> >>>>>> >>> >     end
>>> >>>>>> >>> >     retval
>>> >>>>>> >>> > end
>>> >>>>>> >>> >
>>> >>>>>> >>> > using ProfileView
>>> >>>>>> >>> >
>>> >>>>>> >>> > features = Feature[]
>>> >>>>>> >>> > for i in 1 : 10000
>>> >>>>>> >>> >     push!(features, A())
>>> >>>>>> >>> > end
>>> >>>>>> >>> >
>>> >>>>>> >>> > slow(features)
>>> >>>>>> >>> > @time slow(features)
>>> >>>>>> >>> > fast(features)
>>> >>>>>> >>> > @time fast(features)
>>> >>>>>> >>> >
>>> >>>>>> >>> > The output is:
>>> >>>>>> >>> >
>>> >>>>>> >>> > 0.000136 seconds (10.15 k allocations: 166.417 KB)
>>> >>>>>> >>> > 0.000012 seconds (5 allocations: 176 bytes)
>>> >>>>>> >>> >
>>> >>>>>> >>> >
>>> >>>>>> >>> > This is a HUGE difference! Am I missing something big? Is 
>>> there
>>> >>>>>> >>> > a good
>>> >>>>>> >>> > way
>>> >>>>>> >>> > to inspect code to figure out where I am going wrong?
>>> >>>>>> >>>
>>> >>>>>> >>> This is because of type instability as you will find in the
>>> >>>>>> >>> performance
>>> >>>>>> >>> tips.
>>> >>>>>> >>> Note that slow and fast are not equivalent since the fast 
>>> version
>>> >>>>>> >>> only
>>> >>>>>> >>> accept `A` or `B` but the slow version accepts any subtype of
>>> >>>>>> >>> feature
>>> >>>>>> >>> that you may ever define. Therefore there's no way the 
>>> compiler
>>> >>>>>> >>> can
>>> >>>>>> >>> rewrite the slow version to the fast version.
>>> >>>>>> >>> There are optimizations that can be applied to bring down the 
>>> gap
>>> >>>>>> >>> but
>>> >>>>>> >>> there'll always be a large difference between the two.
>>> >>>>>> >>>
>>> >>>>>> >>> >
>>> >>>>>> >>> >
>>> >>>>>> >>> > Thank you in advance for any guidance.
>>> >>>>>> >>> >
>>> >>>>>> >>> >
>>> >>>>>> >>> > -Tim
>>> >>>>>> >>> >
>>> >>>>>> >>> >
>>> >>>>>> >>> >
>>> >>>>>> >>> >
>>> >>>>>> >>> >
>>>
>>
>>

Re: [julia-users] dispatch slowdown when iterating over array with abstract values

Reply via email to