Thanks for your replies. I'm sorry to trouble you again, but I'm still confused about general concepts.
It seems to me that slowness of dynamic dispatch and type instability are orthogonal. I mean is dynamic dispatch inherently slow, or is it slow because it involves type instability? @code_warntype slow(features) suggests that the return type from evaluate() is Float64, so I can't see any type instability. If this is correct, then why is dynamic dispatch so much slower than run-time if statements? For my own understanding, I was almost hoping that slow() was not type-stable, and annotating with evaluate(features[i])::Float64 would let the compiler produce faster code, but it makes no difference. On Sunday, April 3, 2016 at 11:07:44 PM UTC+10, Cedric St-Jean wrote: > Good call, it was already pointed ou > <https://github.com/JuliaLang/julia/issues/265#issuecomment-42047587>t in > that thread. > > On Sat, Apr 2, 2016 at 11:11 PM, Yichao Yu <[email protected] <javascript:> > > wrote: > >> On Sat, Apr 2, 2016 at 10:53 PM, Cedric St-Jean <[email protected] >> <javascript:>> wrote: >> > That's actually a compiler bug, nice! >> > >> > abstract Feature >> > >> > type A <: Feature end >> > evaluate(f::A) = 1.0 >> > >> > foo(features::Vector{Feature}) = isa(features[1], A) ? >> > evaluate(features[1]::A) : evaluate(features[1]) >> > >> > @show foo(Feature[A()]) >> > >> > type C <: Feature end >> > evaluate(f::C) = 100 >> > >> > @show foo(Feature[C()]) >> > >> > yields >> > >> > foo(Feature[A()]) = 1.0 >> > >> > foo(Feature[C()]) = 4.94e-322 >> > >> > >> > That explains why performance was the same on your computer: the >> compiler >> > was making an incorrect assumption about the return type of `evaluate`. >> Or >> > maybe it's an intentional gamble by the Julia devs, for the sake of >> > performance. >> > >> > I couldn't find any issue describing this. Yichao? >> >> This is effectively #265. It's not always predictable what assumption >> the compiler makes now... >> >> > >> > >> > On Saturday, April 2, 2016 at 10:16:59 PM UTC-4, Greg Plowman wrote: >> >> >> >> Thanks Cedric and Yichao. >> >> >> >> This makes sense that there might be new subtypes and associated >> >> specialised methods. I understand that now. Thanks. >> >> >> >> On my machine (v0.4.5 Windows), fast() and pretty_fast() seem to run in >> >> similar time. >> >> So I looked as @code_warntype as Yichao suggested and get the >> following. >> >> I don't fully know how to interpret output, but return type from the >> final >> >> "catchall" evaluate() seems to be inferred/asserted as Float64 (see >> >> highlighted yellow line below) >> >> >> >> Would this explain why pretty_fast() seems to be as efficient as >> fast()? >> >> >> >> Why is the return type being inferred asserted as Float64? >> >> >> >> >> >> julia> @code_warntype fast(features) >> >> Variables: >> >> features::Array{Feature,1} >> >> retval::Float64 >> >> #s1::Int64 >> >> i::Int64 >> >> >> >> Body: >> >> begin # none, line 2: >> >> retval = 0.0 # none, line 3: >> >> GenSym(2) = (Base.arraylen)(features::Array{Feature,1})::Int64 >> >> GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1, >> >> >> :(((top(getfield))(Base.Intrinsics,:select_value)::I)((Base.sle_int)(1,GenSym(2))::Bool,GenSym(2),(Base.box) >> >> (Int64,(Base.sub_int)(1,1)))::Int64))) >> >> #s1 = (top(getfield))(GenSym(0),:start)::Int64 >> >> unless (Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 === >> >> >> (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool)) >> >> goto 1 >> >> 2: >> >> GenSym(3) = #s1::Int64 >> >> GenSym(4) = (Base.box)(Base.Int,(Base.add_int)(#s1::Int64,1)) >> >> i = GenSym(3) >> >> #s1 = GenSym(4) # none, line 4: >> >> unless >> >> >> (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.A)::Bool >> >> goto 4 # none, line 5: >> >> retval = >> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,1.0)) >> >> goto 5 >> >> 4: # none, line 7: >> >> retval = >> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,0.0)) >> >> 5: >> >> 3: >> >> unless >> >> >> (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 >> >> === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0) >> >> ,:stop)::Int64,1))::Bool)))) goto 2 >> >> 1: >> >> 0: # none, line 10: >> >> return retval::Float64 >> >> end::Float64 >> >> >> >> >> >> julia> @code_warntype pretty_fast(features) >> >> Variables: >> >> features::Array{Feature,1} >> >> retval::Float64 >> >> #s1::Int64 >> >> i::Int64 >> >> >> >> Body: >> >> begin # none, line 2: >> >> retval = 0.0 # none, line 3: >> >> GenSym(2) = (Base.arraylen)(features::Array{Feature,1})::Int64 >> >> GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1, >> >> >> :(((top(getfield))(Base.Intrinsics,:select_value)::I)((Base.sle_int)(1,GenSym(2))::Bool,GenSym(2),(Base.box) >> >> (Int64,(Base.sub_int)(1,1)))::Int64))) >> >> #s1 = (top(getfield))(GenSym(0),:start)::Int64 >> >> unless (Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 === >> >> >> (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool)) >> >> goto 1 >> >> 2: >> >> GenSym(4) = #s1::Int64 >> >> GenSym(5) = (Base.box)(Base.Int,(Base.add_int)(#s1::Int64,1)) >> >> i = GenSym(4) >> >> #s1 = GenSym(5) # none, line 4: >> >> unless >> >> >> (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.A)::Bool >> >> goto 4 # none, line 5: >> >> retval = >> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,1.0)) >> >> goto 6 >> >> 4: # none, line 6: >> >> unless >> >> >> (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.B)::Bool >> >> goto 5 # none, line 7: >> >> retval = >> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,0.0)) >> >> goto 6 >> >> 5: # none, line 9: >> >> GenSym(3) = >> >> >> (Main.evaluate)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature)::Float64 >> >> retval = >> >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,GenSym(3))) >> >> 6: >> >> 3: >> >> unless >> >> >> (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 >> >> === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0) >> >> ,:stop)::Int64,1))::Bool)))) goto 2 >> >> 1: >> >> 0: # none, line 12: >> >> return retval::Float64 >> >> end::Float64 >> >> >> >> Julia> >> >> >> >> >> >> >> >> On Sunday, April 3, 2016 at 8:04:35 AM UTC+10, Cedric St-Jean wrote: >> >>> >> >>> >> >>> >> >>> On Saturday, April 2, 2016 at 5:39:45 PM UTC-4, Greg Plowman wrote: >> >>>> >> >>>> Cedric, >> >>>> On my machine fast() and pretty_fast() run in the roughly the same >> time. >> >>>> Are you sure pre-compiled first? >> >>> >> >>> >> >>> Yep. I'm using @benchmark on Julia 0.4.5 on OSX, and the time >> difference >> >>> is definitely > 2X. >> >>> >> >>>> >> >>>> >> >>>> 1. If you add a default fallback method, say, evaluate(f::Feature) = >> >>>> -1.0 >> >>>> Theoretically, would that inform the compiler that evaluate() always >> >>>> returns a Float64 and therefore is type stable? >> >>> >> >>> >> >>> No, because I might write >> >>> >> >>> type C <: Feature >> >>> end >> >>> evaluate(f::C) = 100 >> >>> >> >>> and a Vector{Feature} might contain a C. Abstract types are >> open-ended, >> >>> new types can be created a runtime, so the compiler can't assume >> anything >> >>> about them, which is why they're not useful for performance. See here. >> >>> >> >>> >> >>>> >> >>>> Does dynamic dispatch mean compiler has to look up method at run time >> >>>> (because it can't know type ahead of time). >> >>>> This is equivalent to explicitly coding "static dispatch" with if >> >>>> statements? >> >>>> If so then why is it so much slower? >> >>>> Providing a fallback method as in 1. or explicitly returning Float64 >> >>>> doesn't seem to improve speed, which naively suggests that slowness >> is from >> >>>> dynamic dispatch not from type instability??? >> >>>> >> >>>> 3. Also, on my machine pretty_fast() is as fast as fast(). Why is >> this >> >>>> so, if pretty_fast() is supposedly type unstable? >> >>>> >> >>>> As you can probably tell, I'm pretty confused on this. >> >>>> >> >>>> >> >>>> On Sunday, April 3, 2016 at 6:34:09 AM UTC+10, Cedric St-Jean wrote: >> >>>>> >> >>>>> Thank you for the detailed explanation. I tried it out: >> >>>>> >> >>>>> function pretty_fast(features::Vector{Feature}) >> >>>>> retval = 0.0 >> >>>>> for i in 1 : length(features) >> >>>>> if isa(features[i], A) >> >>>>> x = evaluate(features[i]::A) >> >>>>> elseif isa(features[i], B) >> >>>>> x = evaluate(features[i]::B) >> >>>>> else >> >>>>> x = evaluate(features[i]) >> >>>>> end >> >>>>> retval += x >> >>>>> end >> >>>>> retval >> >>>>> end >> >>>>> >> >>>>> On my laptop, fast runs in 10 microseconds, pretty_fast in 30, and >> slow >> >>>>> in 210. >> >>>>> >> >>>>> On Saturday, April 2, 2016 at 12:24:18 PM UTC-4, Yichao Yu wrote: >> >>>>>> >> >>>>>> On Sat, Apr 2, 2016 at 12:16 PM, Tim Wheeler < >> [email protected]> >> >>>>>> wrote: >> >>>>>> > Thank you for the comments. In my original code it means the >> >>>>>> > difference >> >>>>>> > between a 30 min execution with memory allocation in the >> Gigabytes >> >>>>>> > and a few >> >>>>>> > seconds of execution with only 800 bytes using the second >> version. >> >>>>>> > I thought under-the-hood Julia basically runs those if statements >> >>>>>> > anyway for >> >>>>>> > its dispatch, and don't know why it needs to allocate any memory. >> >>>>>> > Having the if-statement workaround will be fine though. >> >>>>>> >> >>>>>> Well, if you have a lot of these cheap functions being dynamically >> >>>>>> dispatched I think it is not a good way to use the type. Depending >> on >> >>>>>> your problem, you may be better off using a enum/flags/dict to >> >>>>>> represent the type/get the values. >> >>>>>> >> >>>>>> The reason for the allocation is that the return type is unknown. >> It >> >>>>>> should be obvious to see if you check your code with code_warntype. >> >>>>>> >> >>>>>> > >> >>>>>> > On Saturday, April 2, 2016 at 7:26:11 AM UTC-7, Cedric St-Jean >> >>>>>> > wrote: >> >>>>>> >> >> >>>>>> >> >> >>>>>> >>> Therefore there's no way the compiler can rewrite the slow >> version >> >>>>>> >>> to the >> >>>>>> >>> fast version. >> >>>>>> >> >> >>>>>> >> >> >>>>>> >> It knows that the element type is a Feature, so it could >> produce: >> >>>>>> >> >> >>>>>> >> if isa(features[i], A) >> >>>>>> >> retval += evaluate(features[i]::A) >> >>>>>> >> elseif isa(features[i], B) >> >>>>>> >> retval += evaluate(features[i]::B) >> >>>>>> >> else >> >>>>>> >> retval += evaluate(features[i]) >> >>>>>> >> end >> >>>>>> >> >> >>>>>> >> and it would make sense for abstract types that have few >> subtypes. >> >>>>>> >> I >> >>>>>> >> didn't realize that dispatch was an order of magnitude slower >> than >> >>>>>> >> type >> >>>>>> >> checking. It's easy enough to write a macro generating this >> >>>>>> >> expansion, too. >> >>>>>> >> >> >>>>>> >> On Saturday, April 2, 2016 at 2:05:20 AM UTC-4, Yichao Yu wrote: >> >>>>>> >>> >> >>>>>> >>> On Fri, Apr 1, 2016 at 9:56 PM, Tim Wheeler >> >>>>>> >>> <[email protected]> >> >>>>>> >>> wrote: >> >>>>>> >>> > Hello Julia Users. >> >>>>>> >>> > >> >>>>>> >>> > I ran into a weird slowdown issue and reproduced a minimal >> >>>>>> >>> > working >> >>>>>> >>> > example. >> >>>>>> >>> > Maybe someone can help shed some light. >> >>>>>> >>> > >> >>>>>> >>> > abstract Feature >> >>>>>> >>> > >> >>>>>> >>> > type A <: Feature end >> >>>>>> >>> > evaluate(f::A) = 1.0 >> >>>>>> >>> > >> >>>>>> >>> > type B <: Feature end >> >>>>>> >>> > evaluate(f::B) = 0.0 >> >>>>>> >>> > >> >>>>>> >>> > function slow(features::Vector{Feature}) >> >>>>>> >>> > retval = 0.0 >> >>>>>> >>> > for i in 1 : length(features) >> >>>>>> >>> > retval += evaluate(features[i]) >> >>>>>> >>> > end >> >>>>>> >>> > retval >> >>>>>> >>> > end >> >>>>>> >>> > >> >>>>>> >>> > function fast(features::Vector{Feature}) >> >>>>>> >>> > retval = 0.0 >> >>>>>> >>> > for i in 1 : length(features) >> >>>>>> >>> > if isa(features[i], A) >> >>>>>> >>> > retval += evaluate(features[i]::A) >> >>>>>> >>> > else >> >>>>>> >>> > retval += evaluate(features[i]::B) >> >>>>>> >>> > end >> >>>>>> >>> > end >> >>>>>> >>> > retval >> >>>>>> >>> > end >> >>>>>> >>> > >> >>>>>> >>> > using ProfileView >> >>>>>> >>> > >> >>>>>> >>> > features = Feature[] >> >>>>>> >>> > for i in 1 : 10000 >> >>>>>> >>> > push!(features, A()) >> >>>>>> >>> > end >> >>>>>> >>> > >> >>>>>> >>> > slow(features) >> >>>>>> >>> > @time slow(features) >> >>>>>> >>> > fast(features) >> >>>>>> >>> > @time fast(features) >> >>>>>> >>> > >> >>>>>> >>> > The output is: >> >>>>>> >>> > >> >>>>>> >>> > 0.000136 seconds (10.15 k allocations: 166.417 KB) >> >>>>>> >>> > 0.000012 seconds (5 allocations: 176 bytes) >> >>>>>> >>> > >> >>>>>> >>> > >> >>>>>> >>> > This is a HUGE difference! Am I missing something big? Is >> there >> >>>>>> >>> > a good >> >>>>>> >>> > way >> >>>>>> >>> > to inspect code to figure out where I am going wrong? >> >>>>>> >>> >> >>>>>> >>> This is because of type instability as you will find in the >> >>>>>> >>> performance >> >>>>>> >>> tips. >> >>>>>> >>> Note that slow and fast are not equivalent since the fast >> version >> >>>>>> >>> only >> >>>>>> >>> accept `A` or `B` but the slow version accepts any subtype of >> >>>>>> >>> feature >> >>>>>> >>> that you may ever define. Therefore there's no way the compiler >> >>>>>> >>> can >> >>>>>> >>> rewrite the slow version to the fast version. >> >>>>>> >>> There are optimizations that can be applied to bring down the >> gap >> >>>>>> >>> but >> >>>>>> >>> there'll always be a large difference between the two. >> >>>>>> >>> >> >>>>>> >>> > >> >>>>>> >>> > >> >>>>>> >>> > Thank you in advance for any guidance. >> >>>>>> >>> > >> >>>>>> >>> > >> >>>>>> >>> > -Tim >> >>>>>> >>> > >> >>>>>> >>> > >> >>>>>> >>> > >> >>>>>> >>> > >> >>>>>> >>> > >> > >
