Uh, 15x slower for unaligned access is quite a lot. But Intel (and AMD) arquitectures are much more tolerant in this aspect (and improving). For example, with a Xeon(R) CPU E5-2670 (2 years old) I get:

In [1]: import numpy as np

In [2]: shape = (10000, 10000)

In [3]: x_aligned = np.zeros(shape, dtype=[('x',np.float64),('y',np.int64)])['x']

In [4]: x_unaligned = np.zeros(shape, dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']

In [5]: %timeit res = x_aligned ** 2
1 loops, best of 3: 289 ms per loop

In [6]: %timeit res = x_unaligned ** 2
1 loops, best of 3: 664 ms per loop

so the added cost in this case is just a bit more than 2x. But you can also alleviate this overhead if you do a copy that fits in cache prior to do computations. numexpr does this:

https://github.com/pydata/numexpr/blob/master/numexpr/interp_body.cpp#L203

and the results are pretty good:

In [8]: import numexpr as ne

In [9]: %timeit res = ne.evaluate('x_aligned ** 2')
10 loops, best of 3: 133 ms per loop

In [10]: %timeit res = ne.evaluate('x_unaligned ** 2')
10 loops, best of 3: 134 ms per loop

i.e. there is not a significant difference between aligned and unaligned access to data.

I wonder if the same technique could be applied to NumPy.

Francesc


El 17/04/14 16:26, Aron Ahmadia ha escrit:
Hmnn, I wasn't being clear :)

The default malloc on BlueGene/Q only returns 8 byte alignment, but the SIMD units need 32-byte alignment for loads, stores, and operations or performance suffers. On the /P the required alignment was 16-bytes, but malloc only gave you 8, and trying to perform vectorized loads/stores generated alignment exceptions on unaligned memory.

See https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q and https://computing.llnl.gov/tutorials/bgp/BGP-usage.Walkup.pdf (slides 14 for overview, 15 for the effective performance difference between the unaligned/aligned code) for some notes on this.

A




On Thu, Apr 17, 2014 at 10:18 AM, Nathaniel Smith <n...@pobox.com <mailto:n...@pobox.com>> wrote:

    On 17 Apr 2014 15:09, "Aron Ahmadia" <a...@ahmadia.net
    <mailto:a...@ahmadia.net>> wrote:
    >
    > > On the one hand it would be nice to actually know whether
    posix_memalign is important, before making api decisions on this
    basis.
    >
    > FWIW: On the lightweight IBM cores that the extremely popular
    BlueGene machines were based on, accessing unaligned memory raised
    system faults.  The default behavior of these machines was to
    terminate the program if more than 1000 such errors occurred on a
    given process, and an environment variable allowed you to
    terminate the program if *any* unaligned memory access occurred.
     This is because unaligned memory accesses were 15x (or more)
    slower than aligned memory access.
    >
    > The newer /Q chips seem to be a little more forgiving of this,
    but I think one can in general expect allocated memory alignment
    to be an important performance technique for future high
    performance computing architectures.

    Right, this much is true on lots of architectures, and so malloc
    is careful to always return values with sufficient alignment (e.g.
    8 bytes) to make sure that any standard operation can succeed.

    The question here is whether it will be important to have *even
    more* alignment than malloc gives us by default. A 16 or 32 byte
    wide SIMD instruction might prefer that data have 16 or 32 byte
    alignment, even if normal memory access for the types being
    operated on only requires 4 or 8 byte alignment.

    -n


    _______________________________________________
    NumPy-Discussion mailing list
    NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
    http://mail.scipy.org/mailman/listinfo/numpy-discussion




_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


--
Francesc Alted

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to