Re: [Cython] buffer syntax vs. memory view syntax

Dag Sverre Seljebotn Tue, 08 May 2012 02:47:37 -0700

On 05/08/2012 11:30 AM, Dag Sverre Seljebotn wrote:

On 05/08/2012 11:22 AM, mark florisson wrote:

On 8 May 2012 09:36, Dag Sverre Seljebotn<d.s.seljeb...@astro.uio.no>
wrote:

On 05/08/2012 10:18 AM, Stefan Behnel wrote:


Dag Sverre Seljebotn, 08.05.2012 09:57:


On 05/07/2012 11:21 PM, mark florisson wrote:


On 7 May 2012 19:40, Dag Sverre Seljebotn wrote:


mark florisson wrote:


On 7 May 2012 17:00, Dag Sverre Seljebotn wrote:


On 05/07/2012 04:16 PM, Stefan Behnel wrote:


Stefan Behnel, 07.05.2012 15:04:


Dag Sverre Seljebotn, 07.05.2012 13:48:


BTW, with the coming of memoryviews, me and Mark talked
about just
deprecating the "mytype[...]" meaning buffers, and rather
treat it
as np.ndarray, array.array etc. being some sort of "template
types".
That is,
we disallow "object[int]" and require some special
declarations in
the relevant pxd files.



Hmm, yes, it's unfortunate that we have two different types of
syntax now,
one that declares the item type before the brackets and one that
declares it afterwards.


Should we consider the
buffer interface syntax deprecated and focus on the memory view
syntax?



I think that's the very-long-term intention. Then again, it may be
too early
to really tell yet, we just need to see how the memory views
play out
in
real life and whether they'll be able to replace
np.ndarray[double]
among real users. We don't want to shove things down users
throats.

But the use of the trailing-[] syntax needs some cleaning up.
Me and
Mark agreed we'd put this proposal forward when we got around
to it:

- Deprecate the "object[double]" form, where [dtype] can be stuck
on
any extension type

- But, do NOT (for the next year at least) deprecate
np.ndarray[double],
array.array[double], etc. Basically, there should be a magic flag
in
extension type declarations saying "I can be a buffer".

For one thing, that is sort of needed to open up things for
templated
cdef classes/fused types cdef classes, if that is ever
implemented.



Deprecating is definitely a good start. I think at least if you
only
allow two types as buffers it will be at least reasonably clear
when
one is dealing with fused types or buffers.

Basically, I think memoryviews should live up to demands of the
users,
which would mean there would be no reason to keep the buffer
syntax.



But they are different approaches -- use a different type/API, or
just
try to speed up parts of NumPy..

One thing to do is make memoryviews coerce cheaply back to the
original objects if wanted (which is likely). Writting
np.asarray(mymemview) is kind of annoying.



It is going to be very confusing to have type(mymemview),
repr(mymemview), and so on come out as NumPy arrays, but not have
the
full API of NumPy. Unless you auto-convert on getattr to...



Yeah, the idea is as very simple, as you mention, just keep the
object
around cached, and when you slice construct one lazily.

If you want to eradicate the distinction between the backing
array and
the memory view and make it transparent, I really suggest you
kick back
alive np.ndarray (it can exist in some 'unrealized' state with
delayed
construction after slicing, and so on). Implementation much the same
either way, it is all about how it is presented to the user.



You mean the buffer syntax?

Something like mymemview.asobject() could work though, and while not
much shorter, it would have some polymorphism that np.asarray
does not
have (based probably on some custom PEP 3118 extension)



I was thinking you could allow the user to register a callback, and
use that to coerce from a memoryview back to an object (given a
memoryview object). For numpy this would be np.asarray, and the
implementation is allowed to cache the result (which it will).
It may be too magicky though... but it will be convenient. The
memoryview will act as a subclass, meaning that any of its methods
will override methods of the converted object.



My point was that this seems *way* to magicky.

Beyond "confusing users" and so on that are sort of subjective,
here's a
fundamental problem for you: We're making it very difficult to
type-infer
memoryviews. Consider:

cdef double[:] x = ...
y = x
print y.shape

Now, because y is not typed, you're semantically throwing in a
conversion
on line 2, so that line 3 says that you want the attribute access
to be
invoked on "whatever object x coerced back to". And we have no idea
what
kind of object that is.

If you don't transparently convert to object, it'd be safe to
automatically
infer y as a double[:].



Why can't y be inferred as the type of x due to the assignment?

On a related note, I've said before that I dislike the notion of

cdef double[:] mview = obj

I'd rather like

cdef double[:] mview = double[:](obj)



Why? We currently allow

cdef char* s = some_py_bytes_string

Auto-coercion is a serious part of the language, and I don't see the
advantage of requiring the redundancy in the case above. It's clear
enough
to me what the typed assignment is intended to mean: get me a buffer
view
on the object, regardless of what it is.

I support Robert in that "np.ndarray[double]" is the syntax to use
when
you
want this kind of transparent "be an object when I need to and a
memory
view when I need to".

Proposal:

1) We NEVER deprecate "np.ndarray[double]", we commit to keeping
that in
the language. It means exactly what you would like double[:] to mean,
i.e.
a variable that is memoryview when you need to and an object
otherwise.
When you use this type, you bear the consequences of early-binding
things
that could in theory be overridden.

2) double[:] is for when you want to access data of *any* Python
object
in
a generic way. Raw PEP 3118. In those situations, access to the
underlying
object is much less useful.

2a) Therefore we require that you do "mview.asobject()" manually;
doing
"mview.foo()" is a compile-time error



Sounds good. I think that would clean up the current syntax overlap
very
nicely.

2b) To drive the point home among users, and aid type inference and
overall language clarity, we REMOVE the auto-acquisition and
require that
you do

cdef double[:] mview = double[:](obj)



I don't see the point, as noted above. Either "obj" is statically typed
and
the bare assignment becomes a no-op, or it's not typed and the
assignment
coerces by creating a view. As with all other typed assignments.

2c) Perhaps: Do not even coerce to a Python memoryview and disallow
"print mview"; instead require that you do "print
mview.asmemoryview()"
or
"print memoryview(mview)" or somesuch.



This seems to depend on 2b.



This I don't understand. The question of 2c) is the analogue to
auto-coercion of "char*" to bytes; approving 2c) would put
memoryviews in
line with char*.

Then again, we could in future auto-coerce char* to a ctypes pointer,
and in
that case, coercing a memoryview to an object representing that
memoryview
would be OK.


Character pointers coerce to strings. Hell, even structs coerce to and
from python dicts, so disallowing the same for memoryviews would just
be inconsistent and inconvenient.


OK, but even structs don't coerce back to some arbitrary type, it's
always a dict. I don't necesarrily oppose coercing memoryviews to some
Python memoryview object (not necesarrily the builtin).

I agree that some mview.asobject() triggering a callback defined by some
CEP 1xxx ("cross-language CEP") would be really useful; and that could
form the basis of a new, improved np.ndarray[double] that allows fast
slicing etc. (where that is used automatically whenever needed).

After some thinking I believe I can see more clearly where Mark iscoming from. To sum up, it's either

A) Keep both np.ndarray[double] and double[:] around, with clearlydefined and separate roles. np.ndarray[double] implementation isrevamped to allow fast slicing etc., based on the double[:] implementation.

B) Deprecate np.ndarray[double] sooner rather than later, but makedouble[:] have functionality that is *really* close to whatnp.ndarray[double] currently does. In most cases one should be able tobasically replace np.ndarray[double] with double[:] and the code shouldcontinue to work just like before; difference is that if you pass inanything else than a NumPy array, it will likely fail with a runtimeAttributeError at some point rather than fail a PyType_Check.

Between those two I believe it's a matter of design taste, not so muchrational argument, and I don't know where I stand yet. And I'm going tostop thinking about it until I see what Robert says...


Dag
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] buffer syntax vs. memory view syntax

Reply via email to