On 02/17/2014 03:42 PM, Nathaniel Smith wrote: > Another optimization we should consider that might help a lot in the > same situations where this would help: for code called from the > cpython eval loop, it's afaict possible to determine which inputs are > temporaries by checking their refcnt. In the second call to __add__ in > '(a + b) + c', the temporary will have refcnt 1, while the other > arrays will all have refcnt >1. In such cases (subject to various > sanity checks on shape, dtype, etc) we could elide temporaries by > reusing the input array for the output. The risk is that there may be > some code out there that calls these operations directly from C with > non-temp arrays that nonetheless have refcnt 1, but we should at least > investigate the feasibility. E.g. maybe we can do the optimization for > tp_add but not PyArray_Add.
For element-wise operations such as the above, wouldn't it be even better to use loop fusion, by evaluating the entire compound expression per element, instead of each individual operation ? That would require methods such as __add__ to return an operation object, rather than the result value. I believe a technique like that is used in the numexpr package (https://github.com/pydata/numexpr), which I saw announced here recently... FWIW, Stefan PS: Such a loop-fusion technique would also open the door to other optimizations, such as vectorization (simd)... -- ...ich hab' noch einen Koffer in Berlin... _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion