> (1) allocate the output M outside of the core algorithm, and pass it as an > input, i.e., >
I did that, though it can be argued that this is cheating given that the competitors also have to allocate an array for each loop. With that version (and some more slight optimization: Storing intermediate values in the for loops, using column-major indexing and @simd) [https://gist.github.com/phillipberndt/7dc0aed7eb855f900f0d/21cce76664bdc59f6203ff6f3496e80e256f54cb], the overall time for the N=3..1000 test case is down to 3.67s. (2) @time (for i = 1:100; magic!(M); end). Did it allocate any memory? Then > you have a problem. Use the profiler, or run julia with --track- > allocation=user, to find out where it occurs. It does, about 3 Mb on line 2 (if n % 2 == 1). Doesn't make much sense so I guess the profiler interfered with the optimizer here?! I doubt that trying to get rid of the 3Mb will gain another second though. > (3) Even if it's not allocating, you may have a bottleneck. Use the > profiler to > find it. > The line where the most time is spent is line 11, filling the array in the odd case. I don't see how it could be optimized any further, so that's probably as far as one gets?! - Phillip
