On Mon, 2 Jan 2017 18:46:08 -0800 Nathaniel Smith <n...@pobox.com> wrote: > > So some options include: > - make the default integer precision 64-bits everywhere > - make the default integer precision 32-bits on 32-bit systems, and > 64-bits on 64-bit systems (including Windows)
Either of those two would be the best IMO. Intuitively, I think people would expect 32-bit ints in 32-bit processes by default, and 64-bit ints in 64-bit processes likewise. So I would slightly favour the latter option. > - leave the default integer precision the same, but make accumulators > 64-bits everywhere > - leave the default integer precision the same, but make accumulators > 64-bits on 64-bit systems (including Windows) Both of these options introduce a confusing discrepancy. > - speed: there's probably some cost to using 64-bit integers on 32-bit > systems; how big is the penalty in practice? Ok, I have fired up a Windows VM to compare 32-bit and 64-bit builds. Numpy version is 1.11.2, Python version is 3.5.2. Keep in mind those are Anaconda builds of Numpy, with MKL enabled for linear algebra; YMMV. For each benchmark, the first number is the result on the 32-bit build, the second number on the 64-bit build. Simple arithmetic ----------------- >>> v = np.ones(1024**2, dtype='int32') >>> %timeit v + v # 1.73 ms per loop | 1.78 ms per loop >>> %timeit v * v # 1.77 ms per loop | 1.79 ms per loop >>> %timeit v // v # 5.89 ms per loop | 5.39 ms per loop >>> v = np.ones(1024**2, dtype='int64') >>> %timeit v + v # 3.54 ms per loop | 3.54 ms per loop >>> %timeit v * v # 5.61 ms per loop | 3.52 ms per loop >>> %timeit v // v # 17.1 ms per loop | 13.9 ms per loop Linear algebra -------------- >>> m = np.ones((1024,1024), dtype='int32') >>> %timeit m @ m # 556 ms per loop | 569 ms per loop >>> m = np.ones((1024,1024), dtype='int64') >>> %timeit m @ m # 3.81 s per loop | 1.01 s per loop Sorting ------- >>> v = np.random.RandomState(42).randint(1000, size=1024**2).astype('int32') >>> %timeit np.sort(v) # 43.4 ms per loop | 44 ms per loop >>> v = np.random.RandomState(42).randint(1000, size=1024**2).astype('int64') >>> %timeit np.sort(v) # 61.5 ms per loop | 45.5 ms per loop Indexing -------- >>> v = np.ones(1024**2, dtype='int32') >>> %timeit v[v[::-1]] # 2.38 ms per loop | 4.63 ms per loop >>> v = np.ones(1024**2, dtype='int64') >>> %timeit v[v[::-1]] # 6.9 ms per loop | 3.63 ms per loop Quick summary: - for very simple operations, 32b and 64b builds can have the same perf on each given bitwidth (though speed is uniformly halved on 64-bit integers when the given operation is SIMD-vectorized) - for more sophisticated operations (such as element-wise multiplication or division, or quicksort, but much more so on the matrix product), 32b builds are competitive with 64b builds on 32-bit ints, but lag behind on 64-bit ints - for indexing, it's desirable to use a "native" width integer, regardless of whether that means 32- or 64-bit Of course the numbers will vary depend on the platform (read: compiler), but some aspects of this comparison will probably translate to other platforms. Regards Antoine. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion