On 05/07/2012 01:10 PM, Stefan Behnel wrote:
Dag Sverre Seljebotn, 07.05.2012 12:40:
moving to dev list

Makes sense.

On 05/07/2012 11:17 AM, Stefan Behnel wrote:
Dag Sverre Seljebotn, 07.05.2012 10:44:
On 05/07/2012 07:48 AM, Stefan Behnel wrote:
I wonder why a memory view should be allowed to be None in the first
place.
Buffer arguments aren't (because they get unpacked on entry), so why
should memory views?

? At least when I implemented it, buffers get unpacked but the case of a
None buffer is treated specially, and you're fully allowed (and segfault if
you [] it).

Hmm, ok, maybe I just got confused by the code then.

I think the docs should state that buffer arguments are best used together
with the "not None" declaration then.

... which made me realise that that wasn't even supported. I can't believe
no-one ever reported that as a bug...

https://github.com/cython/cython/commit/f2de49fd0ac82a02a070b931bf4d2dab47135d0b

It's still not supported for memory views.

BTW, is there a reason why we shouldn't allow a "not None" declaration for
cdef functions? Obviously, the caller would have to do the check in that
case. Hmm, maybe it's not that important, because None checks are best done
at entry points from user code, which usually means Python code. It seems
like "not None" is not supported on cpdef functions, though.


I use them with "=None" default values all the time... then do a
None-check manually.

Interesting. Could you given an example? What's the advantage over letting
Cython raise an error for you? And, since you are using it as a default
argument, why would someone want to call your code entirely without a
buffer argument?

Here you go:

def foo(np.ndarray[double] a, np.ndarray[double] out=None):
    if out is None:
        out = np.empty_like(a)
    # compute result in out
    return out

The pattern of handing in the memory area to write to is one of the fundamental basics of numerical computing; you often just can't implement an algorithm if the called function returns the result in a newly-allocated array. I can explain why that is in detail, but I'd rather you just trusted the testimony of somebody doing numerical computation...

It's just a convenience, but often (in particular when testing) it's incredibly convenient to not have to bother with allocating the output array.

Another pattern is:

def do_something(np.ndarray[double] a,
                 np.ndarray[double] sin_of_a=None):
    ...

so if your caller happened to already have computed something, the function uses it, but OTOH the "something" is a function of the inputs and can be computed on the fly. AND, sometimes it can be computed on the fly in ways more efficient than what the caller could have done, because of memory bus issues etc. etc.

Both of these can be "fixed" by a) not allowing the convenient shorthand, or b) declare the argument "object" first and then type it after the "preamble".

So the REAL reason I'm arguing this case is consistency with cdef classes.





It's really no different from cdef classes.

I find it at least a bit more surprising because a buffer unpacking
argument is a rather strong hint that you expect something that supports
this protocol. The fact that you type your function argument with it hints
at the intention to properly unpack it on entry. I'm sure there are lots of
users who were or will be surprised when they realise that that doesn't
exclude None values.

Whereas I think there would be more users surprised by the opposite.

So there -- we won't know who's right without actually finding some users. And chances are we are both right, since users are different from one another.



And I remember that we wanted to change the default settings for extension
type arguments from "or None" to "not None" years ago but never actually
did it.

I remember that there was such a debate, but I certainly don't remember
that this was the conclusion :-)

Maybe not, yes.


I didn't agree with that view then and
I don't now. I don't remember what Robert's view was...

As far as I can remember (which might be biased towards my personal
view), the conclusion was that we left the current semantics in place,
relying on better control flow analysis to make None-checks cheaper, and
when those are cheap enough, make the nonecheck directive default to
True

At least for buffer arguments, it silently corrupts data or segfaults in
the current state of affairs, as you pointed out. Not exactly ideal.

No different than writing to a field in a cdef class...


That's another reason why I see a difference between the behaviour of
extension types and that of buffer arguments. Buffer indexing is also way
more performance critical than the average method call or attribute access
on a cdef class.

Perhaps, but that's a bit hand-wavy to turn into a principle of language design? "This is performance critical, so therefore we suddenly invert the normal rule"?

I just think we should be consistent, not have more special rules for buffers than we need to.

The intention all the time was that "np.ndarray[double]" is just a glorified "np.ndarray". People expect it to behave like an optimized "np.ndarray". If "np.ndarray" can be None, why can't "np.ndarray[double]"?

BTW, with the coming of memoryviews, me and Mark talked about just deprecating the "mytype[...]" meaning buffers, and rather treat it as np.ndarray, array.array etc. being some sort of "template types". That is, we disallow "object[int]" and require some special declarations in the relevant pxd files.

(Java is sort of prior art that this can indeed be done?).

Java was designed to have a JIT compiler underneath which handles external
parameters, and its compilers are way smarter than Cython. I agree that
there is still a lot we can do based on better static analysis, but there
will always be limits.

Any static analysis will be able to get you to the point of "not None" if the user has a manual test. And the Python way is often to just spell things out rather than brevity; I think an explicit if-test is much more newbie friendly than "not None", "or None", etc.

Performance beyond that is rather theoretical for the moment.

I agree that for memoryviews that can be passed in acquired-state to cdef functions there is the question of eliminating an extra branch or so, but that is still far-fetched, and I'd rather Mark raise the issue if it comes an issue than the two of us bikeshedding over it.

I'll try to make this my last post to this thread, I feel we're slipping into Dag-and-Stefan-endless-thread territory...

Dag
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Reply via email to