Hi A followup message ... I managed to increase the speed of my little prime factorization program by a bit more than a factor of two, the performance win is hardware-dependent.
After I learned how to profile the code, I discovered a few things: - 70% of the time was taken in math/big.Mod function (I already knew this) - that function was calling QuoRem which was calling nat.divLarge, meaning that it was calculating the Mod by actually doing the division and then finding the remainder - the big surprise was that of the 65% percent of the runtime taken by divLarge and routines called by it, more than half was going to what is essentially memory management Part of this was going to runtime.makeslice and ultimately runtime.mallocgc, the rest was going to sync Pool put and get calls. divLarge needs to create temporaries, and apparently, in order to relieve pressure on the garbage collector, divLarge makes use of a sync pool of "big.nat" objects. I guess it's done this way (via sync pool) to make divLarge thread-safe. I didn't need to do the division, and I didn't need (not yet anyway) thread safety, and my calculations are all like "a mod N" where a*a < N, in that case one can do Barrett reduction to find a mod N, and since it's millions of a mod N with the same N, the constants for N can be precomputed. I made a pair of global temporaries for the function in the package, which avoids all the calls to sync pool but now it's not thread safe I guess. There were a number of other optimisations that had to do with avoiding makeslice, one was to avoid reusing a particular big.Int variable for first small and then large values ... I defined two temporaries, one to hold values for large calcs and another to hold for smaller, avoiding work in the "make" call as there will always be enough capacity to re-use the variable in those cases. A final silly one that saved 10% of the run time an eliminated almost all garbage collection (after the previously mentioned one) was swapping for (loop over lots of iterations) { var tmp1 big.Int ... for var tmp1 big.Int for (loop over lots of iterations) { ... I had naively assumed that as long as I was executing the loop, it was the same "tmp1", however it was allocating a new tmp1 on each trip around the loop. 1.2 GB of memory allocated in a 25-second run :-) It's been a lot of fun, thanks for the discussions, and I am very impressed with the tooling. pprof absolutely rocks. JT -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.