On Apr 4, 2016 1:36 PM, "Cedric St-Jean" <[email protected]> wrote: > > I'm not a compiler dev, but here's how I understand it: >
Sorry forgot to reply.... > > On Sunday, April 3, 2016 at 6:32:17 PM UTC-4, Greg Plowman wrote: >> >> It seems to me that slowness of dynamic dispatch and type instability are orthogonal. >> >> I mean is dynamic dispatch inherently slow, or is it slow because it involves type instability? > > > Type instability (uncertainty) leads to multiple-dispatch, and multiple dispatch leads to more type instability (in general, but not always), as the return type is harder to pin down. > >> >> @code_warntype slow(features) suggests that the return type from evaluate() is Float64, so I can't see any type instability. >> If this is correct, then why is dynamic dispatch so much slower than run-time if statements? > I didn't know we are doing this optimization. I think the actual limit for type inference is how many method actually matches. If you add more subtype of Feature with different implementation of the function, I think you'll see type inference give up at some point (the limit used to be 4 iirc) Currently, whenever type inference cannot determine a single function to call, it fallback to a full dynamic dispatch at runtime. This is of course not the best way to do it See https://github.com/JuliaLang/julia/issues/10805 and https://github.com/JuliaLang/julia/pull/11862. > > Each object contains an ID which is an integer (or a pointer - I don't know, but it's the same thing). > > if isa(x,A) > > checks wheter x's ID is equal to A's ID. That's an integer comparison, and very fast. In contrast, multiple-dispatch involves looking up the method table for which function to call when x is an Int. I don't know how it's implemented, but a dictionary look-up is likely, and that's much more costly. A "hidden" cost of the dynamic dispatch is that the result needs to be boxed. Since that's the only way one can return an arbitrary type object in C. There are thought on how this can be improved but they involve more complexity and it is not particularly clear if this particular case is worth optimizing for (i.e. if it will regress more important cases for example). > >> >> >> For my own understanding, I was almost hoping that slow() was not type-stable, > > > slow can be type-stable if the compiler assumes that no other subtype/method will be added, which it seems to be doing. There's still the multiple-dispatch cost, since it doesn't know which method to call. To demonstrate the point about type-stability, if I define > > type C <: Feature end > evaluate(f::C) = 1:3 > > at the same time as I defined A and B, then slow becomes twice as slow as before (430 microseconds vs. 210), even though I haven't even instantiated any C object. That's because the compiler no longer assumes that `evaluate` returns a Float (check code_warntype) and thus needs a second multiple-dispatch to make the + call. > >> >> and annotating with evaluate(features[i])::Float64 would let the compiler produce faster code, >> but it makes no difference. >> >> >> On Sunday, April 3, 2016 at 11:07:44 PM UTC+10, Cedric St-Jean wrote: >>> >>> Good call, it was already pointed out in that thread. >>> >>> On Sat, Apr 2, 2016 at 11:11 PM, Yichao Yu <[email protected]> wrote: >>>> >>>> On Sat, Apr 2, 2016 at 10:53 PM, Cedric St-Jean <[email protected]> wrote: >>>> > That's actually a compiler bug, nice! >>>> > >>>> > abstract Feature >>>> > >>>> > type A <: Feature end >>>> > evaluate(f::A) = 1.0 >>>> > >>>> > foo(features::Vector{Feature}) = isa(features[1], A) ? >>>> > evaluate(features[1]::A) : evaluate(features[1]) >>>> > >>>> > @show foo(Feature[A()]) >>>> > >>>> > type C <: Feature end >>>> > evaluate(f::C) = 100 >>>> > >>>> > @show foo(Feature[C()]) >>>> > >>>> > yields >>>> > >>>> > foo(Feature[A()]) = 1.0 >>>> > >>>> > foo(Feature[C()]) = 4.94e-322 >>>> > >>>> > >>>> > That explains why performance was the same on your computer: the compiler >>>> > was making an incorrect assumption about the return type of `evaluate`. Or >>>> > maybe it's an intentional gamble by the Julia devs, for the sake of >>>> > performance. >>>> > >>>> > I couldn't find any issue describing this. Yichao? >>>> >>>> This is effectively #265. It's not always predictable what assumption >>>> the compiler makes now... >>>> >>>> > >>>> > >>>> > On Saturday, April 2, 2016 at 10:16:59 PM UTC-4, Greg Plowman wrote: >>>> >> >>>> >> Thanks Cedric and Yichao. >>>> >> >>>> >> This makes sense that there might be new subtypes and associated >>>> >> specialised methods. I understand that now. Thanks. >>>> >> >>>> >> On my machine (v0.4.5 Windows), fast() and pretty_fast() seem to run in >>>> >> similar time. >>>> >> So I looked as @code_warntype as Yichao suggested and get the following. >>>> >> I don't fully know how to interpret output, but return type from the final >>>> >> "catchall" evaluate() seems to be inferred/asserted as Float64 (see >>>> >> highlighted yellow line below) >>>> >> >>>> >> Would this explain why pretty_fast() seems to be as efficient as fast()? >>>> >> >>>> >> Why is the return type being inferred asserted as Float64? >>>> >> >>>> >> >>>> >> julia> @code_warntype fast(features) >>>> >> Variables: >>>> >> features::Array{Feature,1} >>>> >> retval::Float64 >>>> >> #s1::Int64 >>>> >> i::Int64 >>>> >> >>>> >> Body: >>>> >> begin # none, line 2: >>>> >> retval = 0.0 # none, line 3: >>>> >> GenSym(2) = (Base.arraylen)(features::Array{Feature,1})::Int64 >>>> >> GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1, >>>> >> :(((top(getfield))(Base.Intrinsics,:select_value)::I)((Base.sle_int)(1,GenSym(2))::Bool,GenSym(2),(Base.box) >>>> >> (Int64,(Base.sub_int)(1,1)))::Int64))) >>>> >> #s1 = (top(getfield))(GenSym(0),:start)::Int64 >>>> >> unless (Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 === >>>> >> (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool)) >>>> >> goto 1 >>>> >> 2: >>>> >> GenSym(3) = #s1::Int64 >>>> >> GenSym(4) = (Base.box)(Base.Int,(Base.add_int)(#s1::Int64,1)) >>>> >> i = GenSym(3) >>>> >> #s1 = GenSym(4) # none, line 4: >>>> >> unless >>>> >> (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.A)::Bool >>>> >> goto 4 # none, line 5: >>>> >> retval = >>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,1.0)) >>>> >> goto 5 >>>> >> 4: # none, line 7: >>>> >> retval = >>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,0.0)) >>>> >> 5: >>>> >> 3: >>>> >> unless >>>> >> (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 >>>> >> === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0) >>>> >> ,:stop)::Int64,1))::Bool)))) goto 2 >>>> >> 1: >>>> >> 0: # none, line 10: >>>> >> return retval::Float64 >>>> >> end::Float64 >>>> >> >>>> >> >>>> >> julia> @code_warntype pretty_fast(features) >>>> >> Variables: >>>> >> features::Array{Feature,1} >>>> >> retval::Float64 >>>> >> #s1::Int64 >>>> >> i::Int64 >>>> >> >>>> >> Body: >>>> >> begin # none, line 2: >>>> >> retval = 0.0 # none, line 3: >>>> >> GenSym(2) = (Base.arraylen)(features::Array{Feature,1})::Int64 >>>> >> GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1, >>>> >> :(((top(getfield))(Base.Intrinsics,:select_value)::I)((Base.sle_int)(1,GenSym(2))::Bool,GenSym(2),(Base.box) >>>> >> (Int64,(Base.sub_int)(1,1)))::Int64))) >>>> >> #s1 = (top(getfield))(GenSym(0),:start)::Int64 >>>> >> unless (Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 === >>>> >> (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool)) >>>> >> goto 1 >>>> >> 2: >>>> >> GenSym(4) = #s1::Int64 >>>> >> GenSym(5) = (Base.box)(Base.Int,(Base.add_int)(#s1::Int64,1)) >>>> >> i = GenSym(4) >>>> >> #s1 = GenSym(5) # none, line 4: >>>> >> unless >>>> >> (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.A)::Bool >>>> >> goto 4 # none, line 5: >>>> >> retval = >>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,1.0)) >>>> >> goto 6 >>>> >> 4: # none, line 6: >>>> >> unless >>>> >> (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.B)::Bool >>>> >> goto 5 # none, line 7: >>>> >> retval = >>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,0.0)) >>>> >> goto 6 >>>> >> 5: # none, line 9: >>>> >> GenSym(3) = >>>> >> (Main.evaluate)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature)::Float64 >>>> >> retval = >>>> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,GenSym(3))) >>>> >> 6: >>>> >> 3: >>>> >> unless >>>> >> (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 >>>> >> === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0) >>>> >> ,:stop)::Int64,1))::Bool)))) goto 2 >>>> >> 1: >>>> >> 0: # none, line 12: >>>> >> return retval::Float64 >>>> >> end::Float64 >>>> >> >>>> >> Julia> >>>> >> >>>> >> >>>> >> >>>> >> On Sunday, April 3, 2016 at 8:04:35 AM UTC+10, Cedric St-Jean wrote: >>>> >>> >>>> >>> >>>> >>> >>>> >>> On Saturday, April 2, 2016 at 5:39:45 PM UTC-4, Greg Plowman wrote: >>>> >>>> >>>> >>>> Cedric, >>>> >>>> On my machine fast() and pretty_fast() run in the roughly the same time. >>>> >>>> Are you sure pre-compiled first? >>>> >>> >>>> >>> >>>> >>> Yep. I'm using @benchmark on Julia 0.4.5 on OSX, and the time difference >>>> >>> is definitely > 2X. >>>> >>> >>>> >>>> >>>> >>>> >>>> >>>> 1. If you add a default fallback method, say, evaluate(f::Feature) = >>>> >>>> -1.0 >>>> >>>> Theoretically, would that inform the compiler that evaluate() always >>>> >>>> returns a Float64 and therefore is type stable? >>>> >>> >>>> >>> >>>> >>> No, because I might write >>>> >>> >>>> >>> type C <: Feature >>>> >>> end >>>> >>> evaluate(f::C) = 100 >>>> >>> >>>> >>> and a Vector{Feature} might contain a C. Abstract types are open-ended, >>>> >>> new types can be created a runtime, so the compiler can't assume anything >>>> >>> about them, which is why they're not useful for performance. See here. >>>> >>> >>>> >>> >>>> >>>> >>>> >>>> Does dynamic dispatch mean compiler has to look up method at run time >>>> >>>> (because it can't know type ahead of time). >>>> >>>> This is equivalent to explicitly coding "static dispatch" with if >>>> >>>> statements? >>>> >>>> If so then why is it so much slower? >>>> >>>> Providing a fallback method as in 1. or explicitly returning Float64 >>>> >>>> doesn't seem to improve speed, which naively suggests that slowness is from >>>> >>>> dynamic dispatch not from type instability??? >>>> >>>> >>>> >>>> 3. Also, on my machine pretty_fast() is as fast as fast(). Why is this >>>> >>>> so, if pretty_fast() is supposedly type unstable? >>>> >>>> >>>> >>>> As you can probably tell, I'm pretty confused on this. >>>> >>>> >>>> >>>> >>>> >>>> On Sunday, April 3, 2016 at 6:34:09 AM UTC+10, Cedric St-Jean wrote: >>>> >>>>> >>>> >>>>> Thank you for the detailed explanation. I tried it out: >>>> >>>>> >>>> >>>>> function pretty_fast(features::Vector{Feature}) >>>> >>>>> retval = 0.0 >>>> >>>>> for i in 1 : length(features) >>>> >>>>> if isa(features[i], A) >>>> >>>>> x = evaluate(features[i]::A) >>>> >>>>> elseif isa(features[i], B) >>>> >>>>> x = evaluate(features[i]::B) >>>> >>>>> else >>>> >>>>> x = evaluate(features[i]) >>>> >>>>> end >>>> >>>>> retval += x >>>> >>>>> end >>>> >>>>> retval >>>> >>>>> end >>>> >>>>> >>>> >>>>> On my laptop, fast runs in 10 microseconds, pretty_fast in 30, and slow >>>> >>>>> in 210. >>>> >>>>> >>>> >>>>> On Saturday, April 2, 2016 at 12:24:18 PM UTC-4, Yichao Yu wrote: >>>> >>>>>> >>>> >>>>>> On Sat, Apr 2, 2016 at 12:16 PM, Tim Wheeler < [email protected]> >>>> >>>>>> wrote: >>>> >>>>>> > Thank you for the comments. In my original code it means the >>>> >>>>>> > difference >>>> >>>>>> > between a 30 min execution with memory allocation in the Gigabytes >>>> >>>>>> > and a few >>>> >>>>>> > seconds of execution with only 800 bytes using the second version. >>>> >>>>>> > I thought under-the-hood Julia basically runs those if statements >>>> >>>>>> > anyway for >>>> >>>>>> > its dispatch, and don't know why it needs to allocate any memory. >>>> >>>>>> > Having the if-statement workaround will be fine though. >>>> >>>>>> >>>> >>>>>> Well, if you have a lot of these cheap functions being dynamically >>>> >>>>>> dispatched I think it is not a good way to use the type. Depending on >>>> >>>>>> your problem, you may be better off using a enum/flags/dict to >>>> >>>>>> represent the type/get the values. >>>> >>>>>> >>>> >>>>>> The reason for the allocation is that the return type is unknown. It >>>> >>>>>> should be obvious to see if you check your code with code_warntype. >>>> >>>>>> >>>> >>>>>> > >>>> >>>>>> > On Saturday, April 2, 2016 at 7:26:11 AM UTC-7, Cedric St-Jean >>>> >>>>>> > wrote: >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >>> Therefore there's no way the compiler can rewrite the slow version >>>> >>>>>> >>> to the >>>> >>>>>> >>> fast version. >>>> >>>>>> >> >>>> >>>>>> >> >>>> >>>>>> >> It knows that the element type is a Feature, so it could produce: >>>> >>>>>> >> >>>> >>>>>> >> if isa(features[i], A) >>>> >>>>>> >> retval += evaluate(features[i]::A) >>>> >>>>>> >> elseif isa(features[i], B) >>>> >>>>>> >> retval += evaluate(features[i]::B) >>>> >>>>>> >> else >>>> >>>>>> >> retval += evaluate(features[i]) >>>> >>>>>> >> end >>>> >>>>>> >> >>>> >>>>>> >> and it would make sense for abstract types that have few subtypes. >>>> >>>>>> >> I >>>> >>>>>> >> didn't realize that dispatch was an order of magnitude slower than >>>> >>>>>> >> type >>>> >>>>>> >> checking. It's easy enough to write a macro generating this >>>> >>>>>> >> expansion, too. >>>> >>>>>> >> >>>> >>>>>> >> On Saturday, April 2, 2016 at 2:05:20 AM UTC-4, Yichao Yu wrote: >>>> >>>>>> >>> >>>> >>>>>> >>> On Fri, Apr 1, 2016 at 9:56 PM, Tim Wheeler >>>> >>>>>> >>> <[email protected]> >>>> >>>>>> >>> wrote: >>>> >>>>>> >>> > Hello Julia Users. >>>> >>>>>> >>> > >>>> >>>>>> >>> > I ran into a weird slowdown issue and reproduced a minimal >>>> >>>>>> >>> > working >>>> >>>>>> >>> > example. >>>> >>>>>> >>> > Maybe someone can help shed some light. >>>> >>>>>> >>> > >>>> >>>>>> >>> > abstract Feature >>>> >>>>>> >>> > >>>> >>>>>> >>> > type A <: Feature end >>>> >>>>>> >>> > evaluate(f::A) = 1.0 >>>> >>>>>> >>> > >>>> >>>>>> >>> > type B <: Feature end >>>> >>>>>> >>> > evaluate(f::B) = 0.0 >>>> >>>>>> >>> > >>>> >>>>>> >>> > function slow(features::Vector{Feature}) >>>> >>>>>> >>> > retval = 0.0 >>>> >>>>>> >>> > for i in 1 : length(features) >>>> >>>>>> >>> > retval += evaluate(features[i]) >>>> >>>>>> >>> > end >>>> >>>>>> >>> > retval >>>> >>>>>> >>> > end >>>> >>>>>> >>> > >>>> >>>>>> >>> > function fast(features::Vector{Feature}) >>>> >>>>>> >>> > retval = 0.0 >>>> >>>>>> >>> > for i in 1 : length(features) >>>> >>>>>> >>> > if isa(features[i], A) >>>> >>>>>> >>> > retval += evaluate(features[i]::A) >>>> >>>>>> >>> > else >>>> >>>>>> >>> > retval += evaluate(features[i]::B) >>>> >>>>>> >>> > end >>>> >>>>>> >>> > end >>>> >>>>>> >>> > retval >>>> >>>>>> >>> > end >>>> >>>>>> >>> > >>>> >>>>>> >>> > using ProfileView >>>> >>>>>> >>> > >>>> >>>>>> >>> > features = Feature[] >>>> >>>>>> >>> > for i in 1 : 10000 >>>> >>>>>> >>> > push!(features, A()) >>>> >>>>>> >>> > end >>>> >>>>>> >>> > >>>> >>>>>> >>> > slow(features) >>>> >>>>>> >>> > @time slow(features) >>>> >>>>>> >>> > fast(features) >>>> >>>>>> >>> > @time fast(features) >>>> >>>>>> >>> > >>>> >>>>>> >>> > The output is: >>>> >>>>>> >>> > >>>> >>>>>> >>> > 0.000136 seconds (10.15 k allocations: 166.417 KB) >>>> >>>>>> >>> > 0.000012 seconds (5 allocations: 176 bytes) >>>> >>>>>> >>> > >>>> >>>>>> >>> > >>>> >>>>>> >>> > This is a HUGE difference! Am I missing something big? Is there >>>> >>>>>> >>> > a good >>>> >>>>>> >>> > way >>>> >>>>>> >>> > to inspect code to figure out where I am going wrong? >>>> >>>>>> >>> >>>> >>>>>> >>> This is because of type instability as you will find in the >>>> >>>>>> >>> performance >>>> >>>>>> >>> tips. >>>> >>>>>> >>> Note that slow and fast are not equivalent since the fast version >>>> >>>>>> >>> only >>>> >>>>>> >>> accept `A` or `B` but the slow version accepts any subtype of >>>> >>>>>> >>> feature >>>> >>>>>> >>> that you may ever define. Therefore there's no way the compiler >>>> >>>>>> >>> can >>>> >>>>>> >>> rewrite the slow version to the fast version. >>>> >>>>>> >>> There are optimizations that can be applied to bring down the gap >>>> >>>>>> >>> but >>>> >>>>>> >>> there'll always be a large difference between the two. >>>> >>>>>> >>> >>>> >>>>>> >>> > >>>> >>>>>> >>> > >>>> >>>>>> >>> > Thank you in advance for any guidance. >>>> >>>>>> >>> > >>>> >>>>>> >>> > >>>> >>>>>> >>> > -Tim >>>> >>>>>> >>> > >>>> >>>>>> >>> > >>>> >>>>>> >>> > >>>> >>>>>> >>> > >>>> >>>>>> >>> > >>> >>>
