Re: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function

Dag Sverre Seljebotn Mon, 07 May 2012 04:48:38 -0700

On 05/07/2012 01:10 PM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 07.05.2012 12:40:

moving to dev list


Makes sense.

On 05/07/2012 11:17 AM, Stefan Behnel wrote:

Dag Sverre Seljebotn, 07.05.2012 10:44:

On 05/07/2012 07:48 AM, Stefan Behnel wrote:

I wonder why a memory view should be allowed to be None in the first
place.
Buffer arguments aren't (because they get unpacked on entry), so why
should memory views?


? At least when I implemented it, buffers get unpacked but the case of a
None buffer is treated specially, and you're fully allowed (and segfault if
you [] it).


Hmm, ok, maybe I just got confused by the code then.

I think the docs should state that buffer arguments are best used together
with the "not None" declaration then.


... which made me realise that that wasn't even supported. I can't believe
no-one ever reported that as a bug...

https://github.com/cython/cython/commit/f2de49fd0ac82a02a070b931bf4d2dab47135d0b

It's still not supported for memory views.

BTW, is there a reason why we shouldn't allow a "not None" declaration for
cdef functions? Obviously, the caller would have to do the check in that
case. Hmm, maybe it's not that important, because None checks are best done
at entry points from user code, which usually means Python code. It seems
like "not None" is not supported on cpdef functions, though.

I use them with "=None" default values all the time... then do a
None-check manually.


Interesting. Could you given an example? What's the advantage over letting
Cython raise an error for you? And, since you are using it as a default
argument, why would someone want to call your code entirely without a
buffer argument?


Here you go:

def foo(np.ndarray[double] a, np.ndarray[double] out=None):
    if out is None:
        out = np.empty_like(a)
    # compute result in out
    return out

The pattern of handing in the memory area to write to is one of thefundamental basics of numerical computing; you often just can'timplement an algorithm if the called function returns the result in anewly-allocated array. I can explain why that is in detail, but I'drather you just trusted the testimony of somebody doing numericalcomputation...

It's just a convenience, but often (in particular when testing) it'sincredibly convenient to not have to bother with allocating the outputarray.


Another pattern is:

def do_something(np.ndarray[double] a,
                 np.ndarray[double] sin_of_a=None):
    ...

so if your caller happened to already have computed something, thefunction uses it, but OTOH the "something" is a function of the inputsand can be computed on the fly. AND, sometimes it can be computed on thefly in ways more efficient than what the caller could have done, becauseof memory bus issues etc. etc.

Both of these can be "fixed" by a) not allowing the convenientshorthand, or b) declare the argument "object" first and then type itafter the "preamble".


So the REAL reason I'm arguing this case is consistency with cdef classes.

It's really no different from cdef classes.


I find it at least a bit more surprising because a buffer unpacking
argument is a rather strong hint that you expect something that supports
this protocol. The fact that you type your function argument with it hints
at the intention to properly unpack it on entry. I'm sure there are lots of
users who were or will be surprised when they realise that that doesn't
exclude None values.


Whereas I think there would be more users surprised by the opposite.

So there -- we won't know who's right without actually finding someusers. And chances are we are both right, since users are different fromone another.

And I remember that we wanted to change the default settings for extension
type arguments from "or None" to "not None" years ago but never actually
did it.


I remember that there was such a debate, but I certainly don't remember
that this was the conclusion :-)


Maybe not, yes.

I didn't agree with that view then and
I don't now. I don't remember what Robert's view was...

As far as I can remember (which might be biased towards my personal
view), the conclusion was that we left the current semantics in place,
relying on better control flow analysis to make None-checks cheaper, and
when those are cheap enough, make the nonecheck directive default to
True


At least for buffer arguments, it silently corrupts data or segfaults in
the current state of affairs, as you pointed out. Not exactly ideal.


No different than writing to a field in a cdef class...


That's another reason why I see a difference between the behaviour of
extension types and that of buffer arguments. Buffer indexing is also way
more performance critical than the average method call or attribute access
on a cdef class.

Perhaps, but that's a bit hand-wavy to turn into a principle of languagedesign? "This is performance critical, so therefore we suddenly invertthe normal rule"?

I just think we should be consistent, not have more special rules forbuffers than we need to.

The intention all the time was that "np.ndarray[double]" is just aglorified "np.ndarray". People expect it to behave like an optimized"np.ndarray". If "np.ndarray" can be None, why can't "np.ndarray[double]"?

BTW, with the coming of memoryviews, me and Mark talked about justdeprecating the "mytype[...]" meaning buffers, and rather treat it asnp.ndarray, array.array etc. being some sort of "template types". Thatis, we disallow "object[int]" and require some special declarations inthe relevant pxd files.

(Java is sort of prior art that this can indeed be done?).


Java was designed to have a JIT compiler underneath which handles external
parameters, and its compilers are way smarter than Cython. I agree that
there is still a lot we can do based on better static analysis, but there
will always be limits.

Any static analysis will be able to get you to the point of "not None"if the user has a manual test. And the Python way is often to just spellthings out rather than brevity; I think an explicit if-test is much morenewbie friendly than "not None", "or None", etc.


Performance beyond that is rather theoretical for the moment.

I agree that for memoryviews that can be passed in acquired-state tocdef functions there is the question of eliminating an extra branch orso, but that is still far-fetched, and I'd rather Mark raise the issueif it comes an issue than the two of us bikeshedding over it.

I'll try to make this my last post to this thread, I feel we're slippinginto Dag-and-Stefan-endless-thread territory...


Dag
_______________________________________________
cython-devel mailing list
[email protected]
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] Fwd: Re: [cython-users] checking for "None" in nogil function

Reply via email to