On Mon, Nov 22, 2010 at 1:59 PM, Keith Goodman <kwgood...@gmail.com> wrote:
> On Mon, Nov 22, 2010 at 10:51 AM,  <josef.p...@gmail.com> wrote:
>> On Mon, Nov 22, 2010 at 1:39 PM, Keith Goodman <kwgood...@gmail.com> wrote:
>>> On Mon, Nov 22, 2010 at 10:32 AM,  <josef.p...@gmail.com> wrote:
>>>> On Mon, Nov 22, 2010 at 1:26 PM, Keith Goodman <kwgood...@gmail.com> wrote:
>>>>> On Mon, Nov 22, 2010 at 9:03 AM, Keith Goodman <kwgood...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> @cython.boundscheck(False)
>>>>>> @cython.wraparound(False)
>>>>>> def nanstd_twopass(np.ndarray[np.float64_t, ndim=1] a, int ddof):
>>>>>>    "nanstd of 1d numpy array with dtype=np.float64 along axis=0."
>>>>>>    cdef Py_ssize_t i
>>>>>>    cdef int a0 = a.shape[0], count = 0
>>>>>>    cdef np.float64_t asum = 0, a2sum=0, amean, ai, da
>>>>>>    for i in range(a0):
>>>>>>        ai = a[i]
>>>>>>        if ai == ai:
>>>>>>            asum += ai
>>>>>>            count += 1
>>>>>>    if count > 0:
>>>>>>        amean = asum / count
>>>>>>        asum = 0
>>>>>>        for i in range(a0):
>>>>>>            ai = a[i]
>>>>>>            if ai == ai:
>>>>>>                da = ai - amean
>>>>>>                asum += da
>>>>>>                a2sum += (da * da)
>>>>>>        asum = asum * asum
>>>>>>        return sqrt((a2sum - asum / count) / (count - ddof))
>>>>>>    else:
>>>>>>        return np.float64(NAN)
>>>>>
>>>>> This is 5% faster:
>>>>>
>>>>> @cython.boundscheck(False)
>>>>> @cython.wraparound(False)
>>>>> def nanstd_1d_float64_axis0_2(np.ndarray[np.float64_t, ndim=1] a, int 
>>>>> ddof):
>>>>>    "nanstd of 1d numpy array with dtype=np.float64 along axis=0."
>>>>>    cdef Py_ssize_t i
>>>>>    cdef int a0 = a.shape[0], count = 0
>>>>>    cdef np.float64_t asum = 0, amean, ai
>>>>>    for i in range(a0):
>>>>>        ai = a[i]
>>>>>        if ai == ai:
>>>>>            asum += ai
>>>>>            count += 1
>>>>>    if count > 0:
>>>>>        amean = asum / count
>>>>>        asum = 0
>>>>>        for i in range(a0):
>>>>>            ai = a[i]
>>>>>            if ai == ai:
>>>>>                ai -= amean
>>>>>                asum += (ai * ai)
>>>>>        return sqrt(asum / (count - ddof))
>>>>>    else:
>>>>>        return np.float64(NAN)
>>>>
>>>> I think it would be better to write nanvar instead of nanstd and take
>>>> the square root only in a delegating nanstd, instead of the other way
>>>> around. (Also a change that should be made in scipy.stats)
>>>
>>> Yeah, I noticed that numpy does that. I was planning to have separate
>>> var and std functions. Here's why (from the readme file, but maybe I
>>> should template it, the sqrt automatically converts large ddof to
>>> NaN):
>>
>> I'm not sure what you are saying, dropping the squareroot in the
>> function doesn't require nan handling in the inner loop. If you want
>> to return nan when count-ddof<=0, then you could just replace
>>
>> if count > 0:
>> ...
>>
>> by
>> if count -ddof > 0:
>> ...
>>
>> Or am I missing the point?
>
> Yes, sorry. Ignore the sqrt/nan comment. The point is that I want to
> be able to return the underlying, low-level cython function (see
> below). So I need separate var and std versions to do that (unless I
> can modify default kwargs on the fly).

I think you could still delegate at the cython level, but given the
amount of code duplication that is required, I don't expect it to make
much difference for staying DRY.

Josef

>
>>> Under the hood Nanny uses a separate Cython function for each
>>> combination of ndim, dtype, and axis. A lot of the overhead in
>>> ny.nanmax, for example, is in checking that your axis is within range,
>>> converting non-array data to an array, and selecting the function to
>>> use to calculate nanmax.
>>>
>>> You can get rid of the overhead by doing all this before you, say,
>>> enter an inner loop:
>>>
>>>>>> arr = np.random.rand(10,10)
>>>>>> axis = 0
>>>>>> func, a = ny.func.nanmax_selector(arr, axis)
>>>>>> func.__name__
>>> 'nanmax_2d_float64_axis0'
>>>
>>> Let's see how much faster than runs:
>>>
>>>>> timeit np.nanmax(arr, axis=0)
>>> 10000 loops, best of 3: 25.7 us per loop
>>>>> timeit ny.nanmax(arr, axis=0)
>>> 100000 loops, best of 3: 5.25 us per loop
>>>>> timeit func(a)
>>> 100000 loops, best of 3: 2.5 us per loop
>>>
>>> Note that func is faster than the Numpy's non-nan version of max:
>>>
>>>>> timeit arr.max(axis=0)
>>> 100000 loops, best of 3: 3.28 us per loop
>>>
>>> So adding NaN protection to your inner loops has a negative cost!
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to