On 05/07/2012 11:21 PM, mark florisson wrote:
On 7 May 2012 19:40, Dag Sverre Seljebotn<d.s.seljeb...@astro.uio.no> wrote:
mark florisson<markflorisso...@gmail.com> wrote:
On 7 May 2012 17:00, Dag Sverre Seljebotn<d.s.seljeb...@astro.uio.no>
wrote:
On 05/07/2012 04:16 PM, Stefan Behnel wrote:
Stefan Behnel, 07.05.2012 15:04:
Dag Sverre Seljebotn, 07.05.2012 13:48:
BTW, with the coming of memoryviews, me and Mark talked about just
deprecating the "mytype[...]" meaning buffers, and rather treat it
as
np.ndarray, array.array etc. being some sort of "template types".
That
is,
we disallow "object[int]" and require some special declarations in
the
relevant pxd files.
Hmm, yes, it's unfortunate that we have two different types of
syntax
now,
one that declares the item type before the brackets and one that
declares
it afterwards.
I actually think this merits some more discussion. Should we
consider the
buffer interface syntax deprecated and focus on the memory view
syntax?
I think that's the very-long-term intention. Then again, it may be
too early
to really tell yet, we just need to see how the memory views play out
in
real life and whether they'll be able to replace np.ndarray[double]
among
real users. We don't want to shove things down users throats.
But the use of the trailing-[] syntax needs some cleaning up. Me and
Mark
agreed we'd put this proposal forward when we got around to it:
- Deprecate the "object[double]" form, where [dtype] can be stuck on
any
extension type
- But, do NOT (for the next year at least) deprecate
np.ndarray[double],
array.array[double], etc. Basically, there should be a magic flag in
extension type declarations saying "I can be a buffer".
For one thing, that is sort of needed to open up things for templated
cdef
classes/fused types cdef classes, if that is ever implemented.
Deprecating is definitely a good start. I think at least if you only
allow two types as buffers it will be at least reasonably clear when
one is dealing with fused types or buffers.
Basically, I think memoryviews should live up to demands of the users,
which would mean there would be no reason to keep the buffer syntax.
But they are different approaches -- use a different type/API, or just try to
speed up parts of NumPy..
One thing to do is make memoryviews coerce cheaply back to the
original objects if wanted (which is likely). Writting
np.asarray(mymemview) is kind of annoying.
It is going to be very confusing to have type(mymemview), repr(mymemview), and
so on come out as NumPy arrays, but not have the full API of NumPy. Unless you
auto-convert on getattr to...
Yeah, the idea is as very simple, as you mention, just keep the object
around cached, and when you slice construct one lazily.
If you want to eradicate the distinction between the backing array and the
memory view and make it transparent, I really suggest you kick back alive
np.ndarray (it can exist in some 'unrealized' state with delayed construction
after slicing, and so on). Implementation much the same either way, it is all
about how it is presented to the user.
You mean the buffer syntax?
Something like mymemview.asobject() could work though, and while not much
shorter, it would have some polymorphism that np.asarray does not have (based
probably on some custom PEP 3118 extension)
I was thinking you could allow the user to register a callback, and
use that to coerce from a memoryview back to an object (given a
memoryview object). For numpy this would be np.asarray, and the
implementation is allowed to cache the result (which it will).
It may be too magicky though... but it will be convenient. The
memoryview will act as a subclass, meaning that any of its methods
will override methods of the converted object.
My point was that this seems *way* to magicky.
Beyond "confusing users" and so on that are sort of subjective, here's a
fundamental problem for you: We're making it very difficult to
type-infer memoryviews. Consider:
cdef double[:] x = ...
y = x
print y.shape
Now, because y is not typed, you're semantically throwing in a
conversion on line 2, so that line 3 says that you want the attribute
access to be invoked on "whatever object x coerced back to". And we have
no idea what kind of object that is.
If you don't transparently convert to object, it'd be safe to
automatically infer y as a double[:].
On a related note, I've said before that I dislike the notion of
cdef double[:] mview = obj
I'd rather like
cdef double[:] mview = double[:](obj)
I support Robert in that "np.ndarray[double]" is the syntax to use when
you want this kind of transparent "be an object when I need to and a
memory view when I need to".
Proposal:
1) We NEVER deprecate "np.ndarray[double]", we commit to keeping that
in the language. It means exactly what you would like double[:] to mean,
i.e. a variable that is memoryview when you need to and an object
otherwise. When you use this type, you bear the consequences of
early-binding things that could in theory be overridden.
2) double[:] is for when you want to access data of *any* Python
object in a generic way. Raw PEP 3118. In those situations, access to
the underlying object is much less useful.
2a) Therefore we require that you do "mview.asobject()" manually;
doing "mview.foo()" is a compile-time error
2b) To drive the point home among users, and aid type inference and
overall language clarity, we REMOVE the auto-acquisition and require
that you do
cdef double[:] mview = double[:](obj)
2c) Perhaps: Do not even coerce to a Python memoryview and disallow
"print mview"; instead require that you do "print mview.asmemoryview()"
or "print memoryview(mview)" or somesuch.
(A related proposal that's been up earlier has been that a variable can
be annotated with many interfaces; e.g.
cdef A|B|C obj
...and then when you do "obj.method", it is first looked up in C, then
B, then A, then Python getattr. Not sure if we want to reopen that can
of worms...)
Dag
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel