On 06.07.2015 18:21, Francesc Alted wrote: > 2015-07-06 18:04 GMT+02:00 Jaime Fernández del Río <jaime.f...@gmail.com > <mailto:jaime.f...@gmail.com>>: > > On Mon, Jul 6, 2015 at 10:18 AM, Francesc Alted <fal...@gmail.com > <mailto:fal...@gmail.com>> wrote: > > Hi, > > I have stumbled into this: > > In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)), > dtype=[('f0', np.int64), ('f1', np.int32)]) > > In [63]: %timeit sa['f0'].sum() > 100 loops, best of 3: 4.52 ms per loop > > In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)), > dtype=[('f0', np.int64), ('f1', np.int64)]) > > In [65]: %timeit sa['f0'].sum() > 1000 loops, best of 3: 896 µs per loop > > The first structured array is made of 12-byte records, while the > second is made by 16-byte records, but the latter performs 5x > faster. Also, using an structured array that is made of 8-byte > records is the fastest (expected): > > In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)), > dtype=[('f0', np.int64)]) > > In [67]: %timeit sa['f0'].sum() > 1000 loops, best of 3: 567 µs per loop > > Now, my laptop has a Ivy Bridge processor (i5-3380M) that should > perform quite well on unaligned data: > > > http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/ > > So, if 4 years-old Intel architectures do not have a penalty for > unaligned access, why I am seeing that in NumPy? That strikes > like a quite strange thing to me. > > > I believe that the way numpy is setup, it never does unaligned > access, regardless of the platform, in case it gets run on one that > would go up in flames if you tried to. So my guess would be that you > are seeing chunked copies into a buffer, as opposed to bulk copying > or no copying at all, and that would explain your timing > differences. But Julian or Sebastian can probably give you a more > informed answer. > > > Yes, my guess is that you are right. I suppose that it is possible to > improve the numpy codebase to accelerate this particular access pattern > on Intel platforms, but provided that structured arrays are not that > used (pandas is probably leading this use case by far, and as far as I > know, they are not using structured arrays internally in DataFrames), > then maybe it is not worth to worry about this too much. > > Thanks anyway, > Francesc > > > > Jaime > > > > Thanks, > Francesc > > -- > Francesc Alted > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus > planes de dominación mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > Francesc Alted > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion