I've filed an issue for this:
https://github.com/JuliaLang/julia/issues/5409As Jameson suggested, not promoting the operands in 32-bit integer division/remainder operations looks like it will almost completely fix this performance problem without introducing any danger of overflow.
