On Mon, Nov 22, 2010 at 1:59 PM, Keith Goodman <kwgood...@gmail.com> wrote: > On Mon, Nov 22, 2010 at 10:51 AM, <josef.p...@gmail.com> wrote: >> On Mon, Nov 22, 2010 at 1:39 PM, Keith Goodman <kwgood...@gmail.com> wrote: >>> On Mon, Nov 22, 2010 at 10:32 AM, <josef.p...@gmail.com> wrote: >>>> On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman <kwgood...@gmail.com> wrote: >>>>> On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <kwgood...@gmail.com> >>>>> wrote: >>>>> >>>>>> @cython.boundscheck(False) >>>>>> @cython.wraparound(False) >>>>>> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): >>>>>> "nanstd of 1d numpy array with dtype=np.float64 along axis=0." >>>>>> cdef Py_ssize_t i >>>>>> cdef int a0 = a.shape[0], count = 0 >>>>>> cdef np.float64_t asum = 0, a2sum=0, amean, ai, da >>>>>> for i in range(a0): >>>>>> ai = a[i] >>>>>> if ai == ai: >>>>>> asum += ai >>>>>> count += 1 >>>>>> if count > 0: >>>>>> amean = asum / count >>>>>> asum = 0 >>>>>> for i in range(a0): >>>>>> ai = a[i] >>>>>> if ai == ai: >>>>>> da = ai - amean >>>>>> asum += da >>>>>> a2sum += (da * da) >>>>>> asum = asum * asum >>>>>> return sqrt((a2sum - asum / count) / (count - ddof)) >>>>>> else: >>>>>> return np.float64(NAN) >>>>> >>>>> This is 5% faster: >>>>> >>>>> @cython.boundscheck(False) >>>>> @cython.wraparound(False) >>>>> def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int >>>>> ddof): >>>>> "nanstd of 1d numpy array with dtype=np.float64 along axis=0." >>>>> cdef Py_ssize_t i >>>>> cdef int a0 = a.shape[0], count = 0 >>>>> cdef np.float64_t asum = 0, amean, ai >>>>> for i in range(a0): >>>>> ai = a[i] >>>>> if ai == ai: >>>>> asum += ai >>>>> count += 1 >>>>> if count > 0: >>>>> amean = asum / count >>>>> asum = 0 >>>>> for i in range(a0): >>>>> ai = a[i] >>>>> if ai == ai: >>>>> ai -= amean >>>>> asum += (ai * ai) >>>>> return sqrt(asum / (count - ddof)) >>>>> else: >>>>> return np.float64(NAN) >>>> >>>> I think it would be better to write nanvar instead of nanstd and take >>>> the square root only in a delegating nanstd, instead of the other way >>>> around. (Also a change that should be made in scipy.stats) >>> >>> Yeah, I noticed that numpy does that. I was planning to have separate >>> var and std functions. Here's why (from the readme file, but maybe I >>> should template it, the sqrt automatically converts large ddof to >>> NaN): >> >> I'm not sure what you are saying, dropping the squareroot in the >> function doesn't require nan handling in the inner loop. If you want >> to return nan when count-ddof<=0, then you could just replace >> >> if count > 0: >> ... >> >> by >> if count -ddof > 0: >> ... >> >> Or am I missing the point? > > Yes, sorry. Ignore the sqrt/nan comment. The point is that I want to > be able to return the underlying, low-level cython function (see > below). So I need separate var and std versions to do that (unless I > can modify default kwargs on the fly).
I think you could still delegate at the cython level, but given the amount of code duplication that is required, I don't expect it to make much difference for staying DRY. Josef > >>> Under the hood Nanny uses a separate Cython function for each >>> combination of ndim, dtype, and axis. A lot of the overhead in >>> ny.nanmax, for example, is in checking that your axis is within range, >>> converting non-array data to an array, and selecting the function to >>> use to calculate nanmax. >>> >>> You can get rid of the overhead by doing all this before you, say, >>> enter an inner loop: >>> >>>>>> arr = np.random.rand(10,10) >>>>>> axis = 0 >>>>>> func, a = ny.func.nanmax_selector(arr, axis) >>>>>> func.__name__ >>> 'nanmax_2d_float64_axis0' >>> >>> Let's see how much faster than runs: >>> >>>>> timeit np.nanmax(arr, axis=0) >>> 10000 loops, best of 3: 25.7 us per loop >>>>> timeit ny.nanmax(arr, axis=0) >>> 100000 loops, best of 3: 5.25 us per loop >>>>> timeit func(a) >>> 100000 loops, best of 3: 2.5 us per loop >>> >>> Note that func is faster than the Numpy's non-nan version of max: >>> >>>>> timeit arr.max(axis=0) >>> 100000 loops, best of 3: 3.28 us per loop >>> >>> So adding NaN protection to your inner loops has a negative cost! > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion