Libdivide (http://libdivide.com/) allows converting the DIV
instruction (in runtime) to a series of shifts and MULs, which is
much more efficient in execution time. It works by taking a
number (the divisor or "denominator") and doing some
preprocessing to it, after which dividing by it can be ~8 times
faster (my own measurements). I use it to divide CPU cycles by
the CPU frequency (i.e., two big ugly numbers) to obtain wall
time from it.
Of course it only applies to runtime division -- the compiler can
do the same if the divisor is known in compile time.
* It's a header-only library so I ported the code itself
* I tried to keep my port as mechanical as possible; I can't
really say I know what's going on there
* I only ported the POSIX x86-64 code because that's what I needed
* Signes-ness is a big issue, be sure you use the right variant