Re: [Cython] Cython array type: Summary, introducing CEP 518

Dag Sverre Seljebotn Thu, 18 Jun 2009 06:47:35 -0700

Sorry about the medium-sized length, but I'd like this to be close to my 
last email on the subject. I'd just refer to Robert's mail, but I guess 
some more explanation about NumPy semantics is in order for the benefit 
of non-NumPy-users, so I've made a summary of that.

Stefan Behnel wrote:
> Dag Sverre Seljebotn wrote:
>> Stefan Behnel wrote:
>>> we have three types:
>>>
>>> 1) a dynamic array type
>>> - allocates memory on creation
>>> - reallocates on (explicit) resizing, e.g. a .resize() method
>>> - supports PEP 3118 (and disables shrinking with live buffers)
>>> - returns a typed value on indexing
>>> - returns a typed array copy on slicing
>>> - behaves like a tuple otherwise
>>>
>>> 2) a typed memory view
>>> - created on top of a buffer (or array)
>>> - never allocates memory (for data, that is)
>>> - creates a new view object on slicing
>>> - behaves like an array otherwise
>> This last point is dangerous as we seem to disagree about what an array
>> is.
> 
> It's what I described under 1).
> 
> 
>>> 3) a SIMD memory view
>>> - created on top of a buffer, array or memory view
>>> - supports parallel per-item arithmetic
>>> - behaves like a memory view otherwise
>> Good summary. Starting from this: I want int[:,:] to be the combination
>> of 2) and 3)
> 
> You mean "3) and not 2)", right? Could you explain why you need a syntax
> for this if it's only a view?

I suppose I meant some variation of 3) with some extra bullet points 
(slicing in particular). We need a syntax because SIMD operations must 
be handled as a special-case compile-time.

Robert put it well; what I want is the core NumPy array semantics on a 
view to any array memory -- builtin, so that it can be optimized 
compile-time. We need to return to that; trying to distill something 
else and more generic out of this seems to only bring confusion.

(This is about 1) and 2) in Robert's mail only though.)

I'll make a list of what we mean by NumPy semantics below. At the very 
bottom is some things which I think should *not* be included.

First:

1) Nobody is claming this is elegant or Pythonic. It is catering for a 
numerical special interest, nothing more nor less.

2) As Robert put it: He won't use it himself, but the rest of Cython 
indirectly benefits from all the Cython interest from the numerical users.

3) The proposed semantics below are really not up for in-detail 
discussion, what I'm really after is a "yes" or "no" -- I just don't 
have the time, and NumPy is the de facto standard for Python numerics 
and what everybody expects anyway. I don't want to invent something 
entirely new.

That said, here's a long list of what I mean with NumPy semantics, 
assuming both CEPs are implemented.

# make x a compile-time-optimizeable 2D view on memoryview(obj)
cdef int[:,:] x = obj

# make an unassigned 1D view
cdef int[:] y

# Indexing
x[2,3]

# Access shape, stride info, raw data pointer
x.shape
x.strides
x.data

# Slicing out new view of third row (in two ways)
y = x[2,:]
y = x[2,...]

# Now, modifying y modifies what x points to too.
# Make a copy so that y points to seperate memory:
y = y.copy()

# Indexing with None creates new, 1-length axis
x = y[None, :] # x.shape == (1, y.shape[0])
x = y[:, None] # x.shape == (y.shape[0], 1)

# Now, this:
x[0, 3] = 2
# modifies y[3] too.

# New view of exactly same data
x[:,:]
x[...]

# Set all entries in array 12
x[...] = 12

# Set only first row to 10
x[0, :] = 10

# Some ways of multiplying all elements with 2
x *= 2
x[...] *= 2
x[:,:] *= 2
x += x
x[...] += x

# A more complicated expression...allocates memory
x = stdmath.sqrt(x*x + x*(x+1)/(x+2))

# A more complicated expression...overwrites existing
# memory
x[...] = stdmath.sqrt(x*x + x*(x+1)/(x+2))

# Boolean operators
cdef bint[:,:] b # perhaps we could support 8-bit bool too
b = (x == 2)
# b is now an array the shape of x, containing True where x[i,j] == 2

# Get sum of elements
import numpy as np
np.sum(x)

# As for printing/coercion to Python object, that remains
# TBD. Either memoryview, or a pretty-printing subclass
# of memoryview, implementing NumPy's __toarray__ protocol
# as well for better compatability

Here's what I do NOT want to include from NumPy:

# Get sum and mean
x.sum()
x.mean()
# and so on, you have to do np.sum(x).

# "Fancy indexing" is a mess because the returned object
# (due to implementation constraints) is a copy, not a view,
# thus being inconsistent with the above. My stance is that
# this can go in when we can support treating it as a view,
# instead of following NumPy with making a copy. I have ideas
# for how to do this.

# Get the intersecting array of rows 1, 4 and 5 and
# colums 2 and 1
new_data_copy = x[[1,4,5], [2,1]]

# Set the same intersection to 0. This is where NumPy gets
# really inconsistent; making an exception specifically
# in __setitem__ for this case.
x[[1,4,5], [2,1]] = 0
# modified x

# If y has length 3, pick out element 0 and 4
y[[True, False, False, True]]

...and so on.

-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Cython array type: Summary, introducing CEP 518

Reply via email to