> > Is it possible that PyPy is multithreading something? I struggle to > believe its so much faster - although I'd believe it if it was about the > same. >
My bad. I meant PyPI, not PyPy. i.e. I was speaking of a module for cpython/numpy. Still, I've checked that: Only one thread is active. As another test, I've compiled the current version in C++ (with a thin matrix class wrapper to allow to use 2d-notation, there isn't much conversion needed anymore; I've appended the code to the gist). With gcc/-O2, it needs 1.55s. With clang++, which is probably more interesting here, I've got 1.7s with -O2 and 4.1s without. (Do clang++ and Julia use the same optimizer?) Here's an interesting comparison: > https://gist.github.com/simonster/6195af68c6df33ca965d > Simon's hint gave a *huge* improvement of your version down to *1.5s*. Now the Julia code finally is the fastest - great! (Well, if I apply that modification to the C++ version, it goes down to 1.0s, but that's okay, I guess. The point now was to check if Numpy was beatable.) The final version is here: https://gist.github.com/phillipberndt/7dc0aed7eb855f900f0d#file-magic-jl Thanks again everyone! Cheers, Phillip
