Good call, it was already pointed ou <https://github.com/JuliaLang/julia/issues/265#issuecomment-42047587>t in that thread.
On Sat, Apr 2, 2016 at 11:11 PM, Yichao Yu <[email protected]> wrote: > On Sat, Apr 2, 2016 at 10:53 PM, Cedric St-Jean <[email protected]> > wrote: > > That's actually a compiler bug, nice! > > > > abstract Feature > > > > type A <: Feature end > > evaluate(f::A) = 1.0 > > > > foo(features::Vector{Feature}) = isa(features[1], A) ? > > evaluate(features[1]::A) : evaluate(features[1]) > > > > @show foo(Feature[A()]) > > > > type C <: Feature end > > evaluate(f::C) = 100 > > > > @show foo(Feature[C()]) > > > > yields > > > > foo(Feature[A()]) = 1.0 > > > > foo(Feature[C()]) = 4.94e-322 > > > > > > That explains why performance was the same on your computer: the compiler > > was making an incorrect assumption about the return type of `evaluate`. > Or > > maybe it's an intentional gamble by the Julia devs, for the sake of > > performance. > > > > I couldn't find any issue describing this. Yichao? > > This is effectively #265. It's not always predictable what assumption > the compiler makes now... > > > > > > > On Saturday, April 2, 2016 at 10:16:59 PM UTC-4, Greg Plowman wrote: > >> > >> Thanks Cedric and Yichao. > >> > >> This makes sense that there might be new subtypes and associated > >> specialised methods. I understand that now. Thanks. > >> > >> On my machine (v0.4.5 Windows), fast() and pretty_fast() seem to run in > >> similar time. > >> So I looked as @code_warntype as Yichao suggested and get the following. > >> I don't fully know how to interpret output, but return type from the > final > >> "catchall" evaluate() seems to be inferred/asserted as Float64 (see > >> highlighted yellow line below) > >> > >> Would this explain why pretty_fast() seems to be as efficient as fast()? > >> > >> Why is the return type being inferred asserted as Float64? > >> > >> > >> julia> @code_warntype fast(features) > >> Variables: > >> features::Array{Feature,1} > >> retval::Float64 > >> #s1::Int64 > >> i::Int64 > >> > >> Body: > >> begin # none, line 2: > >> retval = 0.0 # none, line 3: > >> GenSym(2) = (Base.arraylen)(features::Array{Feature,1})::Int64 > >> GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1, > >> > :(((top(getfield))(Base.Intrinsics,:select_value)::I)((Base.sle_int)(1,GenSym(2))::Bool,GenSym(2),(Base.box) > >> (Int64,(Base.sub_int)(1,1)))::Int64))) > >> #s1 = (top(getfield))(GenSym(0),:start)::Int64 > >> unless (Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 === > >> > (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool)) > >> goto 1 > >> 2: > >> GenSym(3) = #s1::Int64 > >> GenSym(4) = (Base.box)(Base.Int,(Base.add_int)(#s1::Int64,1)) > >> i = GenSym(3) > >> #s1 = GenSym(4) # none, line 4: > >> unless > >> > (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.A)::Bool > >> goto 4 # none, line 5: > >> retval = > >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,1.0)) > >> goto 5 > >> 4: # none, line 7: > >> retval = > >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,0.0)) > >> 5: > >> 3: > >> unless > >> > (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 > >> === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0) > >> ,:stop)::Int64,1))::Bool)))) goto 2 > >> 1: > >> 0: # none, line 10: > >> return retval::Float64 > >> end::Float64 > >> > >> > >> julia> @code_warntype pretty_fast(features) > >> Variables: > >> features::Array{Feature,1} > >> retval::Float64 > >> #s1::Int64 > >> i::Int64 > >> > >> Body: > >> begin # none, line 2: > >> retval = 0.0 # none, line 3: > >> GenSym(2) = (Base.arraylen)(features::Array{Feature,1})::Int64 > >> GenSym(0) = $(Expr(:new, UnitRange{Int64}, 1, > >> > :(((top(getfield))(Base.Intrinsics,:select_value)::I)((Base.sle_int)(1,GenSym(2))::Bool,GenSym(2),(Base.box) > >> (Int64,(Base.sub_int)(1,1)))::Int64))) > >> #s1 = (top(getfield))(GenSym(0),:start)::Int64 > >> unless (Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 === > >> > (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0),:stop)::Int64,1))::Bool)) > >> goto 1 > >> 2: > >> GenSym(4) = #s1::Int64 > >> GenSym(5) = (Base.box)(Base.Int,(Base.add_int)(#s1::Int64,1)) > >> i = GenSym(4) > >> #s1 = GenSym(5) # none, line 4: > >> unless > >> > (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.A)::Bool > >> goto 4 # none, line 5: > >> retval = > >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,1.0)) > >> goto 6 > >> 4: # none, line 6: > >> unless > >> > (Main.isa)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature,Main.B)::Bool > >> goto 5 # none, line 7: > >> retval = > >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,0.0)) > >> goto 6 > >> 5: # none, line 9: > >> GenSym(3) = > >> > (Main.evaluate)((Base.arrayref)(features::Array{Feature,1},i::Int64)::Feature)::Float64 > >> retval = > >> (Base.box)(Base.Float64,(Base.add_float)(retval::Float64,GenSym(3))) > >> 6: > >> 3: > >> unless > >> > (Base.box)(Base.Bool,(Base.not_int)((Base.box)(Base.Bool,(Base.not_int)(#s1::Int64 > >> === (Base.box)(Base.Int,(Base.add_int)((top(getfield))(GenSym(0) > >> ,:stop)::Int64,1))::Bool)))) goto 2 > >> 1: > >> 0: # none, line 12: > >> return retval::Float64 > >> end::Float64 > >> > >> Julia> > >> > >> > >> > >> On Sunday, April 3, 2016 at 8:04:35 AM UTC+10, Cedric St-Jean wrote: > >>> > >>> > >>> > >>> On Saturday, April 2, 2016 at 5:39:45 PM UTC-4, Greg Plowman wrote: > >>>> > >>>> Cedric, > >>>> On my machine fast() and pretty_fast() run in the roughly the same > time. > >>>> Are you sure pre-compiled first? > >>> > >>> > >>> Yep. I'm using @benchmark on Julia 0.4.5 on OSX, and the time > difference > >>> is definitely > 2X. > >>> > >>>> > >>>> > >>>> 1. If you add a default fallback method, say, evaluate(f::Feature) = > >>>> -1.0 > >>>> Theoretically, would that inform the compiler that evaluate() always > >>>> returns a Float64 and therefore is type stable? > >>> > >>> > >>> No, because I might write > >>> > >>> type C <: Feature > >>> end > >>> evaluate(f::C) = 100 > >>> > >>> and a Vector{Feature} might contain a C. Abstract types are open-ended, > >>> new types can be created a runtime, so the compiler can't assume > anything > >>> about them, which is why they're not useful for performance. See here. > >>> > >>> > >>>> > >>>> Does dynamic dispatch mean compiler has to look up method at run time > >>>> (because it can't know type ahead of time). > >>>> This is equivalent to explicitly coding "static dispatch" with if > >>>> statements? > >>>> If so then why is it so much slower? > >>>> Providing a fallback method as in 1. or explicitly returning Float64 > >>>> doesn't seem to improve speed, which naively suggests that slowness > is from > >>>> dynamic dispatch not from type instability??? > >>>> > >>>> 3. Also, on my machine pretty_fast() is as fast as fast(). Why is this > >>>> so, if pretty_fast() is supposedly type unstable? > >>>> > >>>> As you can probably tell, I'm pretty confused on this. > >>>> > >>>> > >>>> On Sunday, April 3, 2016 at 6:34:09 AM UTC+10, Cedric St-Jean wrote: > >>>>> > >>>>> Thank you for the detailed explanation. I tried it out: > >>>>> > >>>>> function pretty_fast(features::Vector{Feature}) > >>>>> retval = 0.0 > >>>>> for i in 1 : length(features) > >>>>> if isa(features[i], A) > >>>>> x = evaluate(features[i]::A) > >>>>> elseif isa(features[i], B) > >>>>> x = evaluate(features[i]::B) > >>>>> else > >>>>> x = evaluate(features[i]) > >>>>> end > >>>>> retval += x > >>>>> end > >>>>> retval > >>>>> end > >>>>> > >>>>> On my laptop, fast runs in 10 microseconds, pretty_fast in 30, and > slow > >>>>> in 210. > >>>>> > >>>>> On Saturday, April 2, 2016 at 12:24:18 PM UTC-4, Yichao Yu wrote: > >>>>>> > >>>>>> On Sat, Apr 2, 2016 at 12:16 PM, Tim Wheeler <[email protected] > > > >>>>>> wrote: > >>>>>> > Thank you for the comments. In my original code it means the > >>>>>> > difference > >>>>>> > between a 30 min execution with memory allocation in the Gigabytes > >>>>>> > and a few > >>>>>> > seconds of execution with only 800 bytes using the second version. > >>>>>> > I thought under-the-hood Julia basically runs those if statements > >>>>>> > anyway for > >>>>>> > its dispatch, and don't know why it needs to allocate any memory. > >>>>>> > Having the if-statement workaround will be fine though. > >>>>>> > >>>>>> Well, if you have a lot of these cheap functions being dynamically > >>>>>> dispatched I think it is not a good way to use the type. Depending > on > >>>>>> your problem, you may be better off using a enum/flags/dict to > >>>>>> represent the type/get the values. > >>>>>> > >>>>>> The reason for the allocation is that the return type is unknown. It > >>>>>> should be obvious to see if you check your code with code_warntype. > >>>>>> > >>>>>> > > >>>>>> > On Saturday, April 2, 2016 at 7:26:11 AM UTC-7, Cedric St-Jean > >>>>>> > wrote: > >>>>>> >> > >>>>>> >> > >>>>>> >>> Therefore there's no way the compiler can rewrite the slow > version > >>>>>> >>> to the > >>>>>> >>> fast version. > >>>>>> >> > >>>>>> >> > >>>>>> >> It knows that the element type is a Feature, so it could produce: > >>>>>> >> > >>>>>> >> if isa(features[i], A) > >>>>>> >> retval += evaluate(features[i]::A) > >>>>>> >> elseif isa(features[i], B) > >>>>>> >> retval += evaluate(features[i]::B) > >>>>>> >> else > >>>>>> >> retval += evaluate(features[i]) > >>>>>> >> end > >>>>>> >> > >>>>>> >> and it would make sense for abstract types that have few > subtypes. > >>>>>> >> I > >>>>>> >> didn't realize that dispatch was an order of magnitude slower > than > >>>>>> >> type > >>>>>> >> checking. It's easy enough to write a macro generating this > >>>>>> >> expansion, too. > >>>>>> >> > >>>>>> >> On Saturday, April 2, 2016 at 2:05:20 AM UTC-4, Yichao Yu wrote: > >>>>>> >>> > >>>>>> >>> On Fri, Apr 1, 2016 at 9:56 PM, Tim Wheeler > >>>>>> >>> <[email protected]> > >>>>>> >>> wrote: > >>>>>> >>> > Hello Julia Users. > >>>>>> >>> > > >>>>>> >>> > I ran into a weird slowdown issue and reproduced a minimal > >>>>>> >>> > working > >>>>>> >>> > example. > >>>>>> >>> > Maybe someone can help shed some light. > >>>>>> >>> > > >>>>>> >>> > abstract Feature > >>>>>> >>> > > >>>>>> >>> > type A <: Feature end > >>>>>> >>> > evaluate(f::A) = 1.0 > >>>>>> >>> > > >>>>>> >>> > type B <: Feature end > >>>>>> >>> > evaluate(f::B) = 0.0 > >>>>>> >>> > > >>>>>> >>> > function slow(features::Vector{Feature}) > >>>>>> >>> > retval = 0.0 > >>>>>> >>> > for i in 1 : length(features) > >>>>>> >>> > retval += evaluate(features[i]) > >>>>>> >>> > end > >>>>>> >>> > retval > >>>>>> >>> > end > >>>>>> >>> > > >>>>>> >>> > function fast(features::Vector{Feature}) > >>>>>> >>> > retval = 0.0 > >>>>>> >>> > for i in 1 : length(features) > >>>>>> >>> > if isa(features[i], A) > >>>>>> >>> > retval += evaluate(features[i]::A) > >>>>>> >>> > else > >>>>>> >>> > retval += evaluate(features[i]::B) > >>>>>> >>> > end > >>>>>> >>> > end > >>>>>> >>> > retval > >>>>>> >>> > end > >>>>>> >>> > > >>>>>> >>> > using ProfileView > >>>>>> >>> > > >>>>>> >>> > features = Feature[] > >>>>>> >>> > for i in 1 : 10000 > >>>>>> >>> > push!(features, A()) > >>>>>> >>> > end > >>>>>> >>> > > >>>>>> >>> > slow(features) > >>>>>> >>> > @time slow(features) > >>>>>> >>> > fast(features) > >>>>>> >>> > @time fast(features) > >>>>>> >>> > > >>>>>> >>> > The output is: > >>>>>> >>> > > >>>>>> >>> > 0.000136 seconds (10.15 k allocations: 166.417 KB) > >>>>>> >>> > 0.000012 seconds (5 allocations: 176 bytes) > >>>>>> >>> > > >>>>>> >>> > > >>>>>> >>> > This is a HUGE difference! Am I missing something big? Is > there > >>>>>> >>> > a good > >>>>>> >>> > way > >>>>>> >>> > to inspect code to figure out where I am going wrong? > >>>>>> >>> > >>>>>> >>> This is because of type instability as you will find in the > >>>>>> >>> performance > >>>>>> >>> tips. > >>>>>> >>> Note that slow and fast are not equivalent since the fast > version > >>>>>> >>> only > >>>>>> >>> accept `A` or `B` but the slow version accepts any subtype of > >>>>>> >>> feature > >>>>>> >>> that you may ever define. Therefore there's no way the compiler > >>>>>> >>> can > >>>>>> >>> rewrite the slow version to the fast version. > >>>>>> >>> There are optimizations that can be applied to bring down the > gap > >>>>>> >>> but > >>>>>> >>> there'll always be a large difference between the two. > >>>>>> >>> > >>>>>> >>> > > >>>>>> >>> > > >>>>>> >>> > Thank you in advance for any guidance. > >>>>>> >>> > > >>>>>> >>> > > >>>>>> >>> > -Tim > >>>>>> >>> > > >>>>>> >>> > > >>>>>> >>> > > >>>>>> >>> > > >>>>>> >>> > >
