If you're seeking yet better performance, my main suggestion would be:
(1) allocate the output M outside of the core algorithm, and pass it as an 
input, i.e.,
function magic!(M::Matrix{Int})
    size(M,1) == size(M,2) || error("Input matrix must be square")
    # now the algorithm
end
That way your algorithm shouldn't need to allocate any memory.
(2) @time (for i = 1:100; magic!(M); end). Did it allocate any memory? Then 
you have a problem. Use the profiler, or run julia with --track-
allocation=user, to find out where it occurs.
(3) Even if it's not allocating, you may have a bottleneck. Use the profiler to 
find it.

--Tim

On Tuesday, August 26, 2014 01:53:22 AM Phillip Berndt wrote:
> Iain Dunning wrote:
> > Updated gist for the doubly-even order case
> > https://gist.github.com/IainNZ/9b5f1eb1bcf923ed02d9
> 
> Nice, thanks for your answers, hints and especially to Iain, thanks for
> your example code. For doubly even numbers it indeed beats Matlab:
> 
> *N=3..999, odd numbers only:*
> 
> Iain's version: elapsed time: 3.570507583 seconds (1333360560 bytes
> allocated, 7.64% gc time)
> Matlab: Elapsed time is 3.026155 seconds.
> Python: 1 loops, best of 3: 875 ms per loop
> 
> *N=4..1000, doubly even numbers only:*
> 
> Iain's version: elapsed time: 0.35992636 seconds (670684544 bytes
> allocated, 38.27% gc time)
> Matlab: Elapsed time is 0.961886 seconds.
> Python: 1 loops, best of 3: 263 ms per loop
> 
> Still, Python (Numpy and cPython, not pypy) clearly performs better for odd
> numbers. I'll look further into that and appreciate any further suggestions
> to improve speed ;) I tried to implement a fast version for the non-doubly
> even numbers at
> https://gist.github.com/phillipberndt/7dc0aed7eb855f900f0d/8611596eebac1291a
> 6e5869242c880fa790d4e1c with these results:
> 
> *N=6..998, non-doubly even numbers only*
> 
> My new version: elapsed time: 0.988820401 seconds (833368960 bytes
> allocated, 16.78% gc time)
> Matlab: Elapsed time is 0.933938 seconds.
> Python: 1 loops, best of 3: 503 ms per loop
> 
> Did I miss any obvious optimizations in the latter version?
> 
> Since you've tested your code with N=10,000, I thought that a comparison
> with large N might also be interesting. For N=9,980..10,000, the latest
> version gives
> 
> Iain's & my non-doubly even version: elapsed time: 24.108401104 seconds
> (17764425968 bytes allocated, 1.12% gc time)
> Matlab: Elapsed time is 32.327293 seconds.
> Python: 1 loops, best of 3: 9.52 s per loop
> 
> Clearly beats Matlab, but Python's still far ahead.
> 
> - Phillip

Reply via email to