Hmnn, I wasn't being clear :) The default malloc on BlueGene/Q only returns 8 byte alignment, but the SIMD units need 32-byte alignment for loads, stores, and operations or performance suffers. On the /P the required alignment was 16-bytes, but malloc only gave you 8, and trying to perform vectorized loads/stores generated alignment exceptions on unaligned memory.
See https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q and https://computing.llnl.gov/tutorials/bgp/BGP-usage.Walkup.pdf (slides 14 for overview, 15 for the effective performance difference between the unaligned/aligned code) for some notes on this. A On Thu, Apr 17, 2014 at 10:18 AM, Nathaniel Smith <[email protected]> wrote: > On 17 Apr 2014 15:09, "Aron Ahmadia" <[email protected]> wrote: > > > > > On the one hand it would be nice to actually know whether > posix_memalign is important, before making api decisions on this basis. > > > > FWIW: On the lightweight IBM cores that the extremely popular BlueGene > machines were based on, accessing unaligned memory raised system faults. > The default behavior of these machines was to terminate the program if > more than 1000 such errors occurred on a given process, and an environment > variable allowed you to terminate the program if *any* unaligned memory > access occurred. This is because unaligned memory accesses were 15x (or > more) slower than aligned memory access. > > > > The newer /Q chips seem to be a little more forgiving of this, but I > think one can in general expect allocated memory alignment to be an > important performance technique for future high performance computing > architectures. > > Right, this much is true on lots of architectures, and so malloc is > careful to always return values with sufficient alignment (e.g. 8 bytes) to > make sure that any standard operation can succeed. > > The question here is whether it will be important to have *even more* > alignment than malloc gives us by default. A 16 or 32 byte wide SIMD > instruction might prefer that data have 16 or 32 byte alignment, even if > normal memory access for the types being operated on only requires 4 or 8 > byte alignment. > > -n > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
