Hi, On Tue, Feb 28, 2017 at 3:04 PM, Sebastian K <sebastiankas...@googlemail.com> wrote: > Yes you are right. There is no need to add that line. I deleted it. But the > measured heap peak is still the same.

You're applying the naive matrix multiplication algorithm, which is ideal for minimizing memory use during the computation, but terrible for speed-related stuff like keeping values in the CPU cache: https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm The Numpy version is likely calling into a highly optimized compiled routine for matrix multiplication, which can load chunks of the matrices at a time, to speed up computation. If you really need minimum memory heap usage and don't care about the order of magnitude(s) slowdown, then you might need to use the naive method, maybe implemented in Cython / C. Cheers, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion