There is one thing is see as a potential.
The outer loop *i* is incrementing the first index, and Julia stores things in column-major order, so any speed gain from CPU cache is potentially lost since you using elements that are not contiguous in the inner loops.