On 8 May 2012 10:47, Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> wrote: > On 05/08/2012 11:30 AM, Dag Sverre Seljebotn wrote: >> >> On 05/08/2012 11:22 AM, mark florisson wrote: >>> >>> On 8 May 2012 09:36, Dag Sverre Seljebotn<d.s.seljeb...@astro.uio.no> >>> wrote: >>>> >>>> On 05/08/2012 10:18 AM, Stefan Behnel wrote: >>>>> >>>>> >>>>> Dag Sverre Seljebotn, 08.05.2012 09:57: >>>>>> >>>>>> >>>>>> On 05/07/2012 11:21 PM, mark florisson wrote: >>>>>>> >>>>>>> >>>>>>> On 7 May 2012 19:40, Dag Sverre Seljebotn wrote: >>>>>>>> >>>>>>>> >>>>>>>> mark florisson wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 7 May 2012 17:00, Dag Sverre Seljebotn wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 05/07/2012 04:16 PM, Stefan Behnel wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Stefan Behnel, 07.05.2012 15:04: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Dag Sverre Seljebotn, 07.05.2012 13:48: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> BTW, with the coming of memoryviews, me and Mark talked >>>>>>>>>>>>> about just >>>>>>>>>>>>> deprecating the "mytype[...]" meaning buffers, and rather >>>>>>>>>>>>> treat it >>>>>>>>>>>>> as np.ndarray, array.array etc. being some sort of "template >>>>>>>>>>>>> types". >>>>>>>>>>>>> That is, >>>>>>>>>>>>> we disallow "object[int]" and require some special >>>>>>>>>>>>> declarations in >>>>>>>>>>>>> the relevant pxd files. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hmm, yes, it's unfortunate that we have two different types of >>>>>>>>>>>> syntax now, >>>>>>>>>>>> one that declares the item type before the brackets and one that >>>>>>>>>>>> declares it afterwards. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Should we consider the >>>>>>>>>>> buffer interface syntax deprecated and focus on the memory view >>>>>>>>>>> syntax? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I think that's the very-long-term intention. Then again, it may be >>>>>>>>>> too early >>>>>>>>>> to really tell yet, we just need to see how the memory views >>>>>>>>>> play out >>>>>>>>>> in >>>>>>>>>> real life and whether they'll be able to replace >>>>>>>>>> np.ndarray[double] >>>>>>>>>> among real users. We don't want to shove things down users >>>>>>>>>> throats. >>>>>>>>>> >>>>>>>>>> But the use of the trailing-[] syntax needs some cleaning up. >>>>>>>>>> Me and >>>>>>>>>> Mark agreed we'd put this proposal forward when we got around >>>>>>>>>> to it: >>>>>>>>>> >>>>>>>>>> - Deprecate the "object[double]" form, where [dtype] can be stuck >>>>>>>>>> on >>>>>>>>>> any extension type >>>>>>>>>> >>>>>>>>>> - But, do NOT (for the next year at least) deprecate >>>>>>>>>> np.ndarray[double], >>>>>>>>>> array.array[double], etc. Basically, there should be a magic flag >>>>>>>>>> in >>>>>>>>>> extension type declarations saying "I can be a buffer". >>>>>>>>>> >>>>>>>>>> For one thing, that is sort of needed to open up things for >>>>>>>>>> templated >>>>>>>>>> cdef classes/fused types cdef classes, if that is ever >>>>>>>>>> implemented. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Deprecating is definitely a good start. I think at least if you >>>>>>>>> only >>>>>>>>> allow two types as buffers it will be at least reasonably clear >>>>>>>>> when >>>>>>>>> one is dealing with fused types or buffers. >>>>>>>>> >>>>>>>>> Basically, I think memoryviews should live up to demands of the >>>>>>>>> users, >>>>>>>>> which would mean there would be no reason to keep the buffer >>>>>>>>> syntax. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> But they are different approaches -- use a different type/API, or >>>>>>>> just >>>>>>>> try to speed up parts of NumPy.. >>>>>>>> >>>>>>>>> One thing to do is make memoryviews coerce cheaply back to the >>>>>>>>> original objects if wanted (which is likely). Writting >>>>>>>>> np.asarray(mymemview) is kind of annoying. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> It is going to be very confusing to have type(mymemview), >>>>>>>> repr(mymemview), and so on come out as NumPy arrays, but not have >>>>>>>> the >>>>>>>> full API of NumPy. Unless you auto-convert on getattr to... >>>>>>> >>>>>>> >>>>>>> >>>>>>> Yeah, the idea is as very simple, as you mention, just keep the >>>>>>> object >>>>>>> around cached, and when you slice construct one lazily. >>>>>>> >>>>>>>> If you want to eradicate the distinction between the backing >>>>>>>> array and >>>>>>>> the memory view and make it transparent, I really suggest you >>>>>>>> kick back >>>>>>>> alive np.ndarray (it can exist in some 'unrealized' state with >>>>>>>> delayed >>>>>>>> construction after slicing, and so on). Implementation much the same >>>>>>>> either way, it is all about how it is presented to the user. >>>>>>> >>>>>>> >>>>>>> >>>>>>> You mean the buffer syntax? >>>>>>> >>>>>>>> Something like mymemview.asobject() could work though, and while not >>>>>>>> much shorter, it would have some polymorphism that np.asarray >>>>>>>> does not >>>>>>>> have (based probably on some custom PEP 3118 extension) >>>>>>> >>>>>>> >>>>>>> >>>>>>> I was thinking you could allow the user to register a callback, and >>>>>>> use that to coerce from a memoryview back to an object (given a >>>>>>> memoryview object). For numpy this would be np.asarray, and the >>>>>>> implementation is allowed to cache the result (which it will). >>>>>>> It may be too magicky though... but it will be convenient. The >>>>>>> memoryview will act as a subclass, meaning that any of its methods >>>>>>> will override methods of the converted object. >>>>>> >>>>>> >>>>>> >>>>>> My point was that this seems *way* to magicky. >>>>>> >>>>>> Beyond "confusing users" and so on that are sort of subjective, >>>>>> here's a >>>>>> fundamental problem for you: We're making it very difficult to >>>>>> type-infer >>>>>> memoryviews. Consider: >>>>>> >>>>>> cdef double[:] x = ... >>>>>> y = x >>>>>> print y.shape >>>>>> >>>>>> Now, because y is not typed, you're semantically throwing in a >>>>>> conversion >>>>>> on line 2, so that line 3 says that you want the attribute access >>>>>> to be >>>>>> invoked on "whatever object x coerced back to". And we have no idea >>>>>> what >>>>>> kind of object that is. >>>>>> >>>>>> If you don't transparently convert to object, it'd be safe to >>>>>> automatically >>>>>> infer y as a double[:]. >>>>> >>>>> >>>>> >>>>> Why can't y be inferred as the type of x due to the assignment? >>>>> >>>>> >>>>>> On a related note, I've said before that I dislike the notion of >>>>>> >>>>>> cdef double[:] mview = obj >>>>>> >>>>>> I'd rather like >>>>>> >>>>>> cdef double[:] mview = double[:](obj) >>>>> >>>>> >>>>> >>>>> Why? We currently allow >>>>> >>>>> cdef char* s = some_py_bytes_string >>>>> >>>>> Auto-coercion is a serious part of the language, and I don't see the >>>>> advantage of requiring the redundancy in the case above. It's clear >>>>> enough >>>>> to me what the typed assignment is intended to mean: get me a buffer >>>>> view >>>>> on the object, regardless of what it is. >>>>> >>>>> >>>>>> I support Robert in that "np.ndarray[double]" is the syntax to use >>>>>> when >>>>>> you >>>>>> want this kind of transparent "be an object when I need to and a >>>>>> memory >>>>>> view when I need to". >>>>>> >>>>>> Proposal: >>>>>> >>>>>> 1) We NEVER deprecate "np.ndarray[double]", we commit to keeping >>>>>> that in >>>>>> the language. It means exactly what you would like double[:] to mean, >>>>>> i.e. >>>>>> a variable that is memoryview when you need to and an object >>>>>> otherwise. >>>>>> When you use this type, you bear the consequences of early-binding >>>>>> things >>>>>> that could in theory be overridden. >>>>>> >>>>>> 2) double[:] is for when you want to access data of *any* Python >>>>>> object >>>>>> in >>>>>> a generic way. Raw PEP 3118. In those situations, access to the >>>>>> underlying >>>>>> object is much less useful. >>>>>> >>>>>> 2a) Therefore we require that you do "mview.asobject()" manually; >>>>>> doing >>>>>> "mview.foo()" is a compile-time error >>>>> >>>>> >>>>> >>>>> Sounds good. I think that would clean up the current syntax overlap >>>>> very >>>>> nicely. >>>>> >>>>> >>>>>> 2b) To drive the point home among users, and aid type inference and >>>>>> overall language clarity, we REMOVE the auto-acquisition and >>>>>> require that >>>>>> you do >>>>>> >>>>>> cdef double[:] mview = double[:](obj) >>>>> >>>>> >>>>> >>>>> I don't see the point, as noted above. Either "obj" is statically typed >>>>> and >>>>> the bare assignment becomes a no-op, or it's not typed and the >>>>> assignment >>>>> coerces by creating a view. As with all other typed assignments. >>>>> >>>>> >>>>>> 2c) Perhaps: Do not even coerce to a Python memoryview and disallow >>>>>> "print mview"; instead require that you do "print >>>>>> mview.asmemoryview()" >>>>>> or >>>>>> "print memoryview(mview)" or somesuch. >>>>> >>>>> >>>>> >>>>> This seems to depend on 2b. >>>> >>>> >>>> >>>> This I don't understand. The question of 2c) is the analogue to >>>> auto-coercion of "char*" to bytes; approving 2c) would put >>>> memoryviews in >>>> line with char*. >>>> >>>> Then again, we could in future auto-coerce char* to a ctypes pointer, >>>> and in >>>> that case, coercing a memoryview to an object representing that >>>> memoryview >>>> would be OK. >>> >>> >>> Character pointers coerce to strings. Hell, even structs coerce to and >>> from python dicts, so disallowing the same for memoryviews would just >>> be inconsistent and inconvenient. >> >> >> OK, but even structs don't coerce back to some arbitrary type, it's >> always a dict. I don't necesarrily oppose coercing memoryviews to some >> Python memoryview object (not necesarrily the builtin). >> >> I agree that some mview.asobject() triggering a callback defined by some >> CEP 1xxx ("cross-language CEP") would be really useful; and that could >> form the basis of a new, improved np.ndarray[double] that allows fast >> slicing etc. (where that is used automatically whenever needed). > > > After some thinking I believe I can see more clearly where Mark is coming > from. To sum up, it's either > > A) Keep both np.ndarray[double] and double[:] around, with clearly defined > and separate roles. np.ndarray[double] implementation is revamped to allow > fast slicing etc., based on the double[:] implementation. > > B) Deprecate np.ndarray[double] sooner rather than later, but make double[:] > have functionality that is *really* close to what np.ndarray[double] > currently does. In most cases one should be able to basically replace > np.ndarray[double] with double[:] and the code should continue to work just > like before; difference is that if you pass in anything else than a NumPy > array, it will likely fail with a runtime AttributeError at some point > rather than fail a PyType_Check.
That's a good summary. I have a big preference for B here, but I agree that treating a typed memoryview as both a user object (possibly converted through callback) and a typed memoryview "subclass" is quite magicky. I wouldn't particularly mind something concise like 'm.obj'. The AttributeError would be the case as usual, when a python object doesn't have the right interface. > Between those two I believe it's a matter of design taste, not so much > rational argument, and I don't know where I stand yet. And I'm going to stop > thinking about it until I see what Robert says... > > > Dag > _______________________________________________ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel