1 has type Int. If you add it to something with a different type, you might be causing type instability. What happens if you replace the literal 1 with one(T) for the type you're working with?
-- John On Sep 18, 2014, at 9:56 AM, G. Patrick Mauroy <[email protected]> wrote: > Profiling shows incrementing integers by 1 (i += 1) being the bottleneck. > > Within the same loop are other statements that do take much less time. > > In my performance optimizing zeal, I over typed the hell out of everything to > attempt squeezing performance to the last once. > Some of this zeal did help in other parts of the code, but now struggling > making sense at spending most of the time incrementing by 1. > I suspect the problem is over typing zeal because I seem to recall having a > version not so strongly typed that ran consistently 2-3 times faster for > default Int (but not for other Int types). It was late at night so I don't > recall the details! > > I am pretty confident the increment variables are typed so there should not > be any undue cast. > > Any idea? > > Here is how my code conceptually looks like: > > # Global static type declaration ahead seems to have helped (as opposed to > deriving from eltype of underlying array at the beginning of function being > profiled). > IdType = Int # Int64 > DType = Int > function my_fct(dt1, dt2) > # Convert is for sure unnecessary for default Int types but more rigorous > and necessary in some parts of code when experimenting with other IdType & > DType types. > const oneIdType = convert(IdType, 1) # Used to make sure I increment with a > value of the proper type, again useless with IdType = Int. > const zeroIdType = convert(IdType, 0) > i::IdType = zeroIdType; i2Match::IdType = zeroIdType; i2Lower::IdType = > zeroIdType; i2Upper::IdType = oneIdType; > ... > # Critical loop. > i2Match = i2Lower > while i2Match < i2Upper > @inbounds i2MatchD2 = dt2D2[i2Match] > if i1D <= i2MatchD2 > i += oneIdType # SLOW! > @inbounds i2MatchD1 = dt2D1[i2Match] > @inbounds resid1[i] = i1id1 > ... > end > i2Match += oneIdType # SLOW! > end > ... > end > > The undeclared types are 1-dim arrays of the appropriate type -- basically > all Int in this configuration. > > Enclosed is the full stand-alone code if anyone cares to try. > On my machines, one function call is in the range of 0.05 to 0.1 sec, highly > depending upon garbage collection, so profiling with 100 runs is done in > about 10 sec. > > Thanks. > > Patrick > > <crossJoinFilter.jl>
