On 11/23/12 8:00 PM, Chris Barker - NOAA Federal wrote: > On Thu, Nov 22, 2012 at 6:20 AM, Francesc Alted <franc...@continuum.io> wrote: >> As Nathaniel said, there is not a difference in terms of *what* is >> computed. However, the methods that you suggested actually differ on >> *how* they are computed, and that has dramatic effects on the time >> used. For example: >> >> In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)] >> >> In []: %time arr1 + arr2 + arr3 + arr4 + arr5 >> CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s >> Wall time: 0.15 s >> There are also ways to minimize the size of temporaries, and numexpr is >> one of the simplests: > but you can also use np.add (and friends) to reduce the number of > temporaries. It can make a difference: > > In [11]: def add_5_arrays(arr1, arr2, arr3, arr4, arr5): > ....: result = arr1 + arr2 > ....: np.add(result, arr3, out=result) > ....: np.add(result, arr4, out=result) > ....: np.add(result, arr5, out=result) > > In [13]: timeit arr1 + arr2 + arr3 + arr4 + arr5 > 1 loops, best of 3: 528 ms per loop > > In [17]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5) > 1 loops, best of 3: 293 ms per loop > > (don't have numexpr on this machine for a comparison)
Yes, you are right. However, numexpr still can beat this: In [8]: timeit arr1 + arr2 + arr3 + arr4 + arr5 10 loops, best of 3: 138 ms per loop In [9]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5) 10 loops, best of 3: 74.3 ms per loop In [10]: timeit ne.evaluate("arr1 + arr2 + arr3 + arr4 + arr5") 10 loops, best of 3: 20.8 ms per loop The reason is that numexpr is multithreaded (using 6 cores above), and for memory-bounded problems like this one, fetching data in different threads is more efficient than using a single thread: In [12]: timeit arr1.copy() 10 loops, best of 3: 41 ms per loop In [13]: ne.set_num_threads(1) Out[13]: 6 In [14]: timeit ne.evaluate("arr1") 10 loops, best of 3: 30.7 ms per loop In [15]: ne.set_num_threads(6) Out[15]: 1 In [16]: timeit ne.evaluate("arr1") 100 loops, best of 3: 13.4 ms per loop I.e., the joy of multi-threading is that it not only buys you CPU speed, but can also bring your data from memory faster. So yeah, modern applications *do* need multi-threading for getting good performance. -- Francesc Alted _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion