Hello everyone, I 'm here again to ask a naive question about Numpy
performance.

As far as I know, Numpy's vectorization operator is very effective because
it utilizes SIMD instructions and multi-threads compared to index-style
programming (using a "for" loop and assigning each element with its index
in array).

I 'm wondering how fast Numpy could be so I did some experiments. Take this
simple task as an example:
    a = np.random.rand(10 000 000)
    b = np.random.rand(10 000 000)
    c = a + b

To check the performance, I wrote a simple C++ implementation of adding two
arrays using multi-threads too (with the compile options of: -O3 -mavx2). I
found that the C++ implementation is slightly faster than Numpy (running
100 times each to get a rather convincing statistic).

*Here comes the first question, how come there is this efficiency gap?*
I guess this is because Numpy needs to load the shared object and find the
wrapper of ufunc and then finally execute the underlying computation. Am I
right? Am I missing something here?

Then I did another experiment for this statement:  d = a * b + c , where a,
b, c and d are all numpy arrays. I also use C++ to implement this logic and
found that C++ is 2 times faster than Numpy on average (also executed 100
times each).

I guess this is because in python we first calculate:
    temporary_var = a * b
and then:
    d = temporary_var + c
so we have an unnecessary memory transfer overhead. Since each array is
very large,  Numpy needs to write temporary_var to memory and then read it
back to cache.

However in C++ we could just write d[i] = a[i] * b[i] + c[i] and we won't
create a temporary array along with the memory transfer penalty.

*So another problem is if there is a method to avoid this kind of overhead?*
I 've learned that in Numpy we could create our own ufunc with: *frompyfunc*,
but it seems that there is no SIMD optimization nor multi-threads utilized
since this is 100 times slower than *"d = a * b + c" way*.
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to