I was experimenting with unsafe_load() indexing and "@inbounds begin ... 
end" speedups, and found that they are practically of the same speed, and 
are among the fastest versions. But the Julia code optimization turned out 
to be very fragile. Here is the code which can use only an 8 bytes array g 
for the gaps between divisors (at similar speed as an Int array).

https://gist.github.com/LaszloHars/c691e5c650d8b22bc8a3

This code beats the built in factor() function as it is written. However, 
if we replace the initialization d = d0 in the middle of the function with 
d += g[j0] as noted in the corresponding comment in the code, the running 
time doubles. Since this instruction is outside of loops, the speed 
difference should not be noticeable. It looks like the Julia code 
optimization gets screwed up with this constant expression. Weird!

Reply via email to