On Mon, Nov 22, 2010 at 10:32 AM, <[email protected]> wrote: > On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman <[email protected]> wrote: >> On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <[email protected]> wrote: >> >>> @cython.boundscheck(False) >>> @cython.wraparound(False) >>> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof): >>> "nanstd of 1d numpy array with dtype=np.float64 along axis=0." >>> cdef Py_ssize_t i >>> cdef int a0 = a.shape[0], count = 0 >>> cdef np.float64_t asum = 0, a2sum=0, amean, ai, da >>> for i in range(a0): >>> ai = a[i] >>> if ai == ai: >>> asum += ai >>> count += 1 >>> if count > 0: >>> amean = asum / count >>> asum = 0 >>> for i in range(a0): >>> ai = a[i] >>> if ai == ai: >>> da = ai - amean >>> asum += da >>> a2sum += (da * da) >>> asum = asum * asum >>> return sqrt((a2sum - asum / count) / (count - ddof)) >>> else: >>> return np.float64(NAN) >> >> This is 5% faster: >> >> @cython.boundscheck(False) >> @cython.wraparound(False) >> def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int ddof): >> "nanstd of 1d numpy array with dtype=np.float64 along axis=0." >> cdef Py_ssize_t i >> cdef int a0 = a.shape[0], count = 0 >> cdef np.float64_t asum = 0, amean, ai >> for i in range(a0): >> ai = a[i] >> if ai == ai: >> asum += ai >> count += 1 >> if count > 0: >> amean = asum / count >> asum = 0 >> for i in range(a0): >> ai = a[i] >> if ai == ai: >> ai -= amean >> asum += (ai * ai) >> return sqrt(asum / (count - ddof)) >> else: >> return np.float64(NAN) > > I think it would be better to write nanvar instead of nanstd and take > the square root only in a delegating nanstd, instead of the other way > around. (Also a change that should be made in scipy.stats)
Yeah, I noticed that numpy does that. I was planning to have separate var and std functions. Here's why (from the readme file, but maybe I should template it, the sqrt automatically converts large ddof to NaN): Under the hood Nanny uses a separate Cython function for each combination of ndim, dtype, and axis. A lot of the overhead in ny.nanmax, for example, is in checking that your axis is within range, converting non-array data to an array, and selecting the function to use to calculate nanmax. You can get rid of the overhead by doing all this before you, say, enter an inner loop: >>> arr = np.random.rand(10,10) >>> axis = 0 >>> func, a = ny.func.nanmax_selector(arr, axis) >>> func.__name__ 'nanmax_2d_float64_axis0' Let's see how much faster than runs: >> timeit np.nanmax(arr, axis=0) 10000 loops, best of 3: 25.7 us per loop >> timeit ny.nanmax(arr, axis=0) 100000 loops, best of 3: 5.25 us per loop >> timeit func(a) 100000 loops, best of 3: 2.5 us per loop Note that func is faster than the Numpy's non-nan version of max: >> timeit arr.max(axis=0) 100000 loops, best of 3: 3.28 us per loop So adding NaN protection to your inner loops has a negative cost! _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
