If you want to gain more performances, first identify hot spots by profiling 
your code. This is fairly easy in Julia: 
http://julia.readthedocs.org/en/latest/stdlib/profile/

Making your code less readable to gain a 0.01% speed increase in your whole 
program doesn't worth the pain.

Once bottlenecks are identified, there are plenty of ways to get more speed. 
Explicit or implicit devectorization can be used 
(https://github.com/lindahua/Devectorize.jl), sometimes BLAS can be called 
directly with little or no modification in the program structure, etc. You have 
also to be careful about types and memory layout in these parts.

If you need even more speed you can still leverage the SIMD instructions of 
your CPU  and of course, multicore/multinode parallelism 
(http://julia.readthedocs.org/en/latest/manual/performance-tips/).

But as I said, before optimizing, finish your program so that you would be able 
to understand it perfectly 6 months from now, properly profile it and if it is 
too slow for you, optimize the real bottlenecks (not the fantasized ones).

Reply via email to