Hi

A followup message ... I managed to increase the speed of my little prime 
factorization program by a bit more than a factor of two, the performance 
win is hardware-dependent.  

After I learned how to profile the code, I discovered a few things:


   - 70% of the time was taken in math/big.Mod function (I already knew 
   this)
   - that function was calling QuoRem which was calling nat.divLarge, 
   meaning that it was calculating the Mod by actually doing the division and 
   then finding the remainder
   - the big surprise was that of the 65% percent of the runtime taken by 
   divLarge and routines called by it, more than half was going to what is 
   essentially memory management

Part of this was going to runtime.makeslice and ultimately 
runtime.mallocgc, the rest was going to sync Pool put and get calls. 
 divLarge needs to create temporaries, and apparently, in order to relieve 
pressure on the garbage collector, divLarge makes use of a sync pool of 
"big.nat" objects.  I guess it's done this way (via sync pool) to make 
divLarge thread-safe.

I didn't need to do the division, and I didn't need (not yet anyway) thread 
safety, and my calculations are all like "a mod N" where a*a < N, in that 
case one can do Barrett reduction to find a mod N, and since it's millions 
of a mod N with the same N, the constants for N can be precomputed.  I made 
a pair of global temporaries for the function in the package, which avoids 
all the calls to sync pool but now it's not thread safe I guess.

There were a number of other optimisations that had to do with avoiding 
makeslice, one was to avoid reusing a particular big.Int variable for first 
small and then large values ... I defined two temporaries, one to hold 
values for large calcs and another to hold for smaller, avoiding work in 
the "make" call as there will always be enough capacity to re-use the 
variable in those cases. 

A final silly one that saved 10% of the run time an eliminated almost all 
garbage collection (after the previously mentioned one) was swapping

for (loop over lots of iterations) {
   var tmp1 big.Int
   ... 

for 

var tmp1 big.Int
for (loop over lots of iterations) {
   ... 

I had naively assumed that as long as I was executing the loop, it was the 
same "tmp1", however it was allocating a new tmp1 on each trip around the 
loop.  1.2 GB of memory allocated in a 25-second run :-)

It's been a lot of fun, thanks for the discussions, and I am very impressed 
with the tooling.  pprof absolutely rocks.

 JT

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to