Thanks a lot for answering this question but I still have some uncertainties.
I 'm trying to improve the time efficiency as much as possible so I 'm not mainly worried about memory allocation, since in my opinion it won't cost much. Instead, the memory accessing is my central concern because of the cache miss penalty. In your snippet, there will be 4 accesses to the whole arrays, which is: the access to a (in a *= b) the access to b (in a *= b) the access to c (in c += a) the access to a (in c += a) This is better than d = a * b + c, but I truly need a new array d to hold the final result because I don't want to spoil the data in array c also. So let's replace c += a with d = a + c, and in this way there will be 5 accesses to the whole array in total. However, under optimal conditions, which can be achieved by C++ implementation ( d[i] = a[i] * b[i] + c[i]), we only need four accesses to the whole array. In modern CPU, the cost of this kind of simple calculation is negligible compared to memory access, so I guess I still need a better way. So much thanks again for your reply! Kevin Sheppard <kevin.k.shepp...@gmail.com> 于2022年9月16日周五 15:38写道: > You can use inplace operators where appropriate to avoid memory allocation. > > > > a *= b > > c += a > > > > Kevin > > > > > > *From: *腾刘 <27rabbi...@gmail.com> > *Sent: *Friday, September 16, 2022 8:35 AM > *To: *Discussion of Numerical Python <numpy-discussion@python.org> > *Subject: *[Numpy-discussion] How to avoid this memory copy overhead in > d=a*b+c? > > > > Hello everyone, I 'm here again to ask a naive question about Numpy > performance. > > > > As far as I know, Numpy's vectorization operator is very effective because > it utilizes SIMD instructions and multi-threads compared to index-style > programming (using a "for" loop and assigning each element with its index > in array). > > > > I 'm wondering how fast Numpy could be so I did some experiments. Take > this simple task as an example: > > a = np.random.rand(10 000 000) > > b = np.random.rand(10 000 000) > > c = a + b > > > > To check the performance, I wrote a simple C++ implementation of adding > two arrays using multi-threads too (with the compile options of: -O3 > -mavx2). I found that the C++ implementation is slightly faster than Numpy > (running 100 times each to get a rather convincing statistic). > > > > *Here comes the first question, how come there is this efficiency gap?* > > I guess this is because Numpy needs to load the shared object and find the > wrapper of ufunc and then finally execute the underlying computation. Am I > right? Am I missing something here? > > > > Then I did another experiment for this statement: d = a * b + c , where > a, b, c and d are all numpy arrays. I also use C++ to implement this logic > and found that C++ is 2 times faster than Numpy on average (also executed > 100 times each). > > > > I guess this is because in python we first calculate: > > temporary_var = a * b > > and then: > > d = temporary_var + c > > so we have an unnecessary memory transfer overhead. Since each array is > very large, Numpy needs to write temporary_var to memory and then read it > back to cache. > > > > However in C++ we could just write d[i] = a[i] * b[i] + c[i] and we won't > create a temporary array along with the memory transfer penalty. > > > > *So another problem is if there is a method to avoid this kind of > overhead?* I 've learned that in Numpy we could create our own ufunc > with: *frompyfunc*, but it seems that there is no SIMD optimization nor > multi-threads utilized since this is 100 times slower than *"d = a * b + > c" way*. > > > > > > > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: 27rabbi...@gmail.com >
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com