On 01/05/2013 10:31 PM, Nathaniel Smith wrote: > On 5 Jan 2013 12:16, "Matthew Brett" <matthew.br...@gmail.com> wrote: >> >> Hi, >> >> Following on from Nathaniel's explorations of the scalar - array >> casting rules, some resources on rank-0 arrays. >> >> The discussion that Nathaniel tracked down on "rank-0 arrays"; it also >> makes reference to casting. The rank-0 arrays seem to have been one >> way of solving the problem of maintaining array dtypes other than bool >> / float / int: >> >> http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001612.html >> >> Quoting from an email from Travis in that thread, replying to an email >> from Tim Hochberg: >> >> http://mail.scipy.org/pipermail/numpy-discussion/2002-September/001647.html >> >> <quote> >>> Frankly, I have no idea what the implimentation details would be, but >>> could we get rid of rank-0 arrays altogether? I have always simply found >>> them strange and confusing... What are they really neccesary for >>> (besides holding scalar values of different precision that standard >>> Pyton scalars)? >> >> With new coercion rules this becomes a possibility. Arguments against it >> are that special rank-0 arrays behave as more consistent numbers with the >> rest of Numeric than Python scalars. In other words they have a length >> and a shape and one can right N-dimensional code that works the same even >> when the result is a scalar. >> >> Another advantage of having a Numeric scalar is that we can control the >> behavior of floating point operations better. >> >> e.g. >> >> if only Python scalars were available and sum(a) returned 0, then >> >> 1 / sum(a) would behave as Python behaves (always raises error). >> >> while with our own scalars >> >> 1 / sum(a) could potentially behave however the user wanted. >> </quote> >> >> There seemed then to be some impetus to remove rank-0 arrays and >> replace them with Python scalar types with the various numpy >> precisions : >> >> http://mail.scipy.org/pipermail/numpy-discussion/2002-September/013983.html >> >> Travis' recent email hints at something that seems similar, but I >> don't understand what he means: >> >> http://mail.scipy.org/pipermail/numpy-discussion/2012-December/064795.html >> >> <quote> >> Don't create array-scalars. Instead, make the data-type object a >> meta-type object whose instances are the items returned from NumPy >> arrays. There is no need for a separate array-scalar object and in >> fact it's confusing to the type-system. I understand that now. I >> did not understand that 5 years ago. >> </quote> >> >> Travis - can you expand? > > Numpy has 3 partially overlapping concepts: > > A) scalars (what Travis calls "array scalars"): Things like "float64", > "int32". These are ordinary Python classes; usually when you subscript > an array, what you get back is an instance of one of these classes: > > In [1]: a = np.array([1, 2, 3]) > > In [2]: a[0] > Out[2]: 1 > > In [3]: type(a[0]) > Out[3]: numpy.int64 > > Note that even though they are called "array scalars", they have > nothing to do with the actual ndarray type -- they are totally > separate objects. > > B) dtypes: These are instances of class np.dtype. For every scalar > type, there is a corresponding dtype object; plus you can create new > dtype objects for things like record arrays (which correspond to > scalars of type "np.void"; I don't really understand how void scalars > work in detail): > > In [8]: int64_dtype = np.dtype(np.int64) > > In [9]: int64_dtype > Out[9]: dtype('int64') > > In [10]: type(int64_dtype) > Out[10]: numpy.dtype > > In [11]: int64_dtype.type > Out[11]: numpy.int64 > > C) rank-0 arrays: Plain old ndarray objects that happen to have ndim > == 0, shape == (). These are arrays which are scalars, but they are > not array scalars. Arrays HAVE-A dtype. > > In [15]: int64_arr = np.array(1) > > In [16]: int64_arr > Out[16]: array(1) > > In [17]: int64_arr.dtype > Out[17]: dtype('int64') > > ------------ > > Okay given that background: > > What Travis was saying in that email was that he thought (A) and (B) > should be combined. Instead of having np.float64-the-class and > dtype(np.float64)-the-dtype-object, we should make dtype objects > actually *be* the scalar classes. (They would still be dtype objects, > which means they would be "metaclasses", which is just a fancy way to > say, dtype would be a subclass of the Python class "type", and dtype > objects would be class objects that had extra functionality.) > > Those old mailing list threads are debating about (A) versus (C). What > we ended up with is what I described above -- we have "rank-0" > (0-dimensional) arrays, and we have array scalar objects that are a > different set of python types and objects entirely. The actual > implementation is totally different -- to the point that we a 35,000 > line auto-generated C file implementing arithmetic for scalars, *and* > a 10,000 line auto-generated C file implementing arithmetic for arrays > (including 0-dim arrays), and these have different functionality and > bugs: > https://github.com/numpy/numpy/issues/593 > > However, the actual goal of all this code is to make array scalars and > 0-dim arrays entirely indistinguishable. Supposedly they have the same > APIs and generally behave exactly the same, modulo bugs (but surely > there can't be many of those...), and two things: > > 1) isinstance(scalar, np.int64) is a sorta-legitimate way to do a type > check. But isinstance(zerodim_arr, np.int64) is always false. Instead > you have to use issubdtype(zerodim_arr, np.int64). (I mean, obviously, > right?) > > 2) Scalars are always read-only, like regular Python scalars. 0-dim > arrays are in general writeable... unless you set them to read-only. I > think the only behavioural difference between an array scalar and a > read-only 0-dim array is that for read-only 0-dim arrays, in-place > operations raise an exception: > > In [5]: scalar = np.int64(1) > > # same as 'scalar = scalar + 2', i.e., creates a new object > In [6]: scalar += 2 > > In [7]: scalar > Out[7]: 3 > > In [10]: zerodim = np.array(1) > > In [11]: zerodim.flags.writeable = False > > In [12]: zerodim += 2 > ValueError: return array is not writeable > > Also, scalar indexing of ndarrays returns scalar objects. Except when > it returns a 0-dim array -- I'm pretty sure this can happen when the > moon is right, though I forget the details. ndarray subclasses? custom > dtypes? Maybe someone will remember. > > Q: We could make += work on read-only arrays with, like, a 2 line fix. > So wouldn't it be simpler to throw away the tens of thousands of lines > of code used to implement scalars, and just use 0-dim arrays > everywhere instead? So like, np.array([1, 2, 3])[1] would return a > read-only 0-dim array, which acted just like the current scalar > objects in basically every way? > > A: Excellent question! So ndarrays would be similar to Python strings > -- indexing an ndarray would return another ndarray, just like > indexing a string returns another string? > > Q: Yeah. I mean, I remember that seemed weird when I first learned > Python, but when have you ever felt the Python was really missing a > "character" type like C has?
str is immutable which makes this a lot easier to deal with without getting confused. So basically you have: a[0:1] # read-write view a[[0]] # read-write copy a[0] # read-only view AND, += are allowed on all read-only arrays, they just transparently create a copy instead of doing the operation in-place. Try to enumerate all the fundamentally different things (if you count memory use/running time) that can happen for ndarrays a, b, and arbitrary x here: a += b[x] That's already quite a lot, your proposal adds even more options. It's certainly a lot more complicated than str. To me it all sounds like a lot of rules introduced just to have the result of a[0] be "kind of a scalar" without actually choosing that option. BUT I should read up on that thread you posted on why that won't work, didn't have time yet... Dag Sverre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion