Try running with --track-allocation=user and see if it's allocating memory on that line. If so, you have a type problem. http://docs.julialang.org/en/latest/manual/performance-tips/ (2nd and 3rd sections)
--Tim On Thursday, September 18, 2014 10:44:58 AM G. Patrick Mauroy wrote: > No change. > I over typed everything to avoid such type mismatches, particularly when > experimenting with other integer types. So unless I missed something > somewhere, it should not be the case. > I suspect something like the compiler does not recognize the incrementing > variables should be registries. Unless it is the inherent speed of > incrementing, but I doubt it, I had some faster runs at some points... > > On Thursday, September 18, 2014 12:58:12 PM UTC-4, John Myles White wrote: > > 1 has type Int. If you add it to something with a different type, you > > might be causing type instability. What happens if you replace the literal > > 1 with one(T) for the type you're working with? > > > > -- John > > > > On Sep 18, 2014, at 9:56 AM, G. Patrick Mauroy <gpma...@gmail.com > > <javascript:>> wrote: > > > > Profiling shows incrementing integers by 1 (i += 1) being the bottleneck. > > > > Within the same loop are other statements that do take much less time. > > > > In my performance optimizing zeal, I over typed the hell out of everything > > to attempt squeezing performance to the last once. > > Some of this zeal did help in other parts of the code, but now struggling > > making sense at spending most of the time incrementing by 1. > > I suspect the problem is over typing zeal because I seem to recall having > > a version not so strongly typed that ran consistently 2-3 times faster for > > default Int (but not for other Int types). It was late at night so I > > don't > > recall the details! > > > > I am pretty confident the increment variables are typed so there should > > not be any undue cast. > > > > Any idea? > > > > Here is how my code conceptually looks like: > > > > # Global static type declaration ahead seems to have helped (as opposed to > > > >> deriving from eltype of underlying array at the beginning of function > >> being > >> profiled). > >> IdType = Int # Int64 > >> DType = Int > >> function my_fct(dt1, dt2) > >> > >> # Convert is for sure unnecessary for default Int types but more > >> > >> rigorous and necessary in some parts of code when experimenting with > >> other > >> IdType & DType types. > >> > >> const oneIdType = convert(IdType, 1) # Used to make sure I increment > >> > >> with a value of the proper type, again useless with IdType = Int. > >> > >> const zeroIdType = convert(IdType, 0) > >> i::IdType = zeroIdType; i2Match::IdType = zeroIdType; i2Lower::IdType = > >> > >> zeroIdType; i2Upper::IdType = oneIdType; > >> > >> ... > >> > >> # Critical loop. > >> i2Match = i2Lower > >> while i2Match < i2Upper > >> > >> @inbounds i2MatchD2 = dt2D2[i2Match] > >> if i1D <= i2MatchD2 > >> > >> i += oneIdType # SLOW! > >> @inbounds i2MatchD1 = dt2D1[i2Match] > >> @inbounds resid1[i] = i1id1 > >> ... > >> > >> end > >> i2Match += oneIdType # SLOW! > >> > >> end > >> > >> ... > >> > >> end > > > > The undeclared types are 1-dim arrays of the appropriate type -- basically > > all Int in this configuration. > > > > Enclosed is the full stand-alone code if anyone cares to try. > > On my machines, one function call is in the range of 0.05 to 0.1 sec, > > highly depending upon garbage collection, so profiling with 100 runs is > > done in about 10 sec. > > > > Thanks. > > > > Patrick > > > > <crossJoinFilter.jl>