Re: [Numpy-discussion] Numpy 1.11.0b1 is out

Derek Homeier Mon, 01 Feb 2016 16:24:08 -0800

> On 31 Jan 2016, at 9:48 am, Sebastian Berg <sebast...@sipsolutions.net> wrote:
> 
> On Sa, 2016-01-30 at 20:27 +0100, Derek Homeier wrote:
>> On 27 Jan 2016, at 1:10 pm, Sebastian Berg <
>> sebast...@sipsolutions.net> wrote:
>>> 
>>> On Mi, 2016-01-27 at 11:19 +0000, Nadav Horesh wrote:
>>>> Why the dot function/method is slower than @ on python 3.5.1?
>>>> Tested
>>>> from the latest 1.11 maintenance branch.
>>>> 
>>> 
>>> The explanation I think is that you do not have a blas
>>> optimization. In
>>> which case the fallback mode is probably faster in the @ case
>>> (since it
>>> has SSE2 optimization by using einsum, while np.dot does not do
>>> that).
>> 
>> I am a bit confused now, as A @ c is just short for A.__matmul__(c)
>> or equivalent
>> to np.matmul(A,c), so why would these not use the optimised blas?
>> Also, I am getting almost identical results on my Mac, yet I thought
>> numpy would
>> by default build against the VecLib optimised BLAS. If I build
>> explicitly against
>> ATLAS, I am actually seeing slightly slower results.
>> But I also saw these kind of warnings on the first timeit runs:
>> 
>> %timeit A.dot(c)
>> The slowest run took 6.91 times longer than the fastest. This could
>> mean that an intermediate result is being cached
>> 
>> and when testing much larger arrays, the discrepancy between matmul
>> and dot rather
>> increases, so perhaps this is more an issue of a less memory
>> -efficient implementation
>> in np.dot?
> 
> Sorry, I missed the fact that one of the arrays was 3D. In that case I
> am not even sure which if the functions call into blas or what else
> they have to do, would have to check. Note that `np.dot` uses a
> different type of combinging high dimensional arrays. @/matmul
> broadcasts extra axes, while np.dot will do the outer combination of
> them, so that the result is:
> 
> As = A.shape
> As.pop(-1)
> cs = c.shape
> cs.pop(-2)  # if possible
> result_shape = As + cs
> 
> which happens to be identical if only A.ndim > 2 and c.ndim <= 2.


Makes sense now; with A.ndim = 2 both operations take about the same time
(and are ~50% faster with VecLib than with ATLAS) and yield identical results,
while any additional dimension in A adds more overhead time to np.dot,
and the results are np.allclose, but not exactly identical.

Thanks,
                                                Derek

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy 1.11.0b1 is out

Reply via email to