Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

Francesc Alted Thu, 17 Apr 2014 09:08:44 -0700

Uh, 15x slower for unaligned access is quite a lot. But Intel (and AMD)arquitectures are much more tolerant in this aspect (and improving).For example, with a Xeon(R) CPU E5-2670 (2 years old) I get:


In [1]: import numpy as np


In [2]: shape = (10000, 10000)

In [3]: x_aligned = np.zeros(shape,dtype=[('x',np.float64),('y',np.int64)])['x']

In [4]: x_unaligned = np.zeros(shape,dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']


In [5]: %timeit res = x_aligned ** 2
1 loops, best of 3: 289 ms per loop

In [6]: %timeit res = x_unaligned ** 2
1 loops, best of 3: 664 ms per loop

so the added cost in this case is just a bit more than 2x. But you canalso alleviate this overhead if you do a copy that fits in cache priorto do computations. numexpr does this:


https://github.com/pydata/numexpr/blob/master/numexpr/interp_body.cpp#L203

and the results are pretty good:

In [8]: import numexpr as ne

In [9]: %timeit res = ne.evaluate('x_aligned ** 2')
10 loops, best of 3: 133 ms per loop

In [10]: %timeit res = ne.evaluate('x_unaligned ** 2')
10 loops, best of 3: 134 ms per loop

i.e. there is not a significant difference between aligned and unalignedaccess to data.


I wonder if the same technique could be applied to NumPy.

Francesc


El 17/04/14 16:26, Aron Ahmadia ha escrit:

Hmnn, I wasn't being clear :)

The default malloc on BlueGene/Q only returns 8 byte alignment, butthe SIMD units need 32-byte alignment for loads, stores, andoperations or performance suffers. On the /P the required alignmentwas 16-bytes, but malloc only gave you 8, and trying to performvectorized loads/stores generated alignment exceptions on unalignedmemory.

See https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q andhttps://computing.llnl.gov/tutorials/bgp/BGP-usage.Walkup.pdf (slides14 for overview, 15 for the effective performance difference betweenthe unaligned/aligned code) for some notes on this.

On Thu, Apr 17, 2014 at 10:18 AM, Nathaniel Smith <n...@pobox.com<mailto:n...@pobox.com>> wrote:


    On 17 Apr 2014 15:09, "Aron Ahmadia" <a...@ahmadia.net
    <mailto:a...@ahmadia.net>> wrote:
    >
    > > On the one hand it would be nice to actually know whether
    posix_memalign is important, before making api decisions on this
    basis.
    >
    > FWIW: On the lightweight IBM cores that the extremely popular
    BlueGene machines were based on, accessing unaligned memory raised
    system faults.  The default behavior of these machines was to
    terminate the program if more than 1000 such errors occurred on a
    given process, and an environment variable allowed you to
    terminate the program if *any* unaligned memory access occurred.
     This is because unaligned memory accesses were 15x (or more)
    slower than aligned memory access.
    >
    > The newer /Q chips seem to be a little more forgiving of this,
    but I think one can in general expect allocated memory alignment
    to be an important performance technique for future high
    performance computing architectures.

    Right, this much is true on lots of architectures, and so malloc
    is careful to always return values with sufficient alignment (e.g.
    8 bytes) to make sure that any standard operation can succeed.

    The question here is whether it will be important to have *even
    more* alignment than malloc gives us by default. A 16 or 32 byte
    wide SIMD instruction might prefer that data have 16 or 32 byte
    alignment, even if normal memory access for the types being
    operated on only requires 4 or 8 byte alignment.

    -n


    _______________________________________________
    NumPy-Discussion mailing list
    NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
    http://mail.scipy.org/mailman/listinfo/numpy-discussion




_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



--
Francesc Alted

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

Reply via email to