Re: [Cython] Proposal: Cython array type (CEP 517)

Dag Sverre Seljebotn Thu, 11 Jun 2009 08:33:50 -0700

Stefan Behnel wrote:
> Dag Sverre Seljebotn wrote:
>   
>> Thanks to chats with Kurt and Robert, I think I've managed to present my 
>> thoughts from a different and more constructive angle.
>>
>> http://wiki.cython.org/enhancements/array
>>
>> There's some time constraints in the picture for various reasons (e.g. 
>> Fortran GSoC development direction) so it would be great if this could 
>> be resolved one way or the other before too long.
>>     
>
> I'm mostly happy with the proposal. IMHO, this would also fit many use
> cases of ticket 153
>   
Nice, and thanks for the feedback. I was mostly worried that this would 
be seen as too special-purpose for inclusion in mainline Cython.


I'll start with what I consider most important here:

> - Should this
> """
> Arithmetic (a + b turned into a loop on the arrays a and b)
> """
>
> read
>
> """
> Arithmetic (a + b returns a concatenated copy of a and b)
> """
> ?
>   
Well, that would upset all us numerical users a lot :-)

Doing arithmetic componentwise is one of the big ideas here. If + means 
concatenation you gain perhaps a convenient array type but it becomes 
much, *much* less useful for numerical purposes. I can see why one would 
argue against it though (not found in Python, array.array does 
concatenation etc.) so this is where I can only state what I'd 
personally find really useful. I feel pragmatism should win this one but 
you've got purism on your side.

PEP 3118 doesn't allow resizing buffers, which makes concatenation less 
natural to me. Though we could always return a new copy...

Anyway, say we want to make the usecases of both groups possible. Then 
the choice seems clear; if we use + for addition, one can make use of 
the full set +,-,/,*,** on arrays, while concatenation, being only one 
operation, could be moved into "cat(arr1, arr2)" or something. Using + 
for concatenation makes -,/,*,** useless...

I'm convinced that + would be used *a lot* more for componentwise 
addition than for concatenation.


> http://trac.cython.org/cython_trac/ticket/153
>
> - I wonder why the "custom reference counting" is necessary, though. I
> assume that the main advantage of doing it in Cython instead of CPython is
> that we could allow memory allocation and ref-counting in nogil blocks, but
> that would require us to add some other kind of synchronisation for arrays
> that pass function barriers. I don't think that's worth it.
>
> I doubt that many people would be surprised by getting a real Python object
> when they use this feature, so why not just implement Cython arrays as
> PyVarObjects?
>   
Yes, I've been thinking about this as well, and you are right that the 
proposal isn't quite there in that area.

I'll need to think more before I write more about this. Suffice to say 
now that there's many layers:

1) Memory container object. Definitely Python object.
2) Typed buffer acquisition, likely containing a single refcounted 
Py_buffer.
3) The actual variables, which will be structs with unpacked information 
from the Py_buffer.

Although I would like multithreading to be easier for numerical 
programming, here's an orthogonal solution: A new type of "nogil" 
section which releases the GIL but temporarily reacquires it on Python 
operations. On most code this would be slower, but it allows writing 
mostly nogil code but more easily take a slice or reassign an array here 
and there.

I need to reread that post on what issues Greg had with "with gil" though.

> - Another point is the "inout" modifier in the one example. Would that only
> apply to C++ or would the semantics be the same in C? And: is it a general
> modifier that would also apply to scalar variables (and how), or only to
> arrays? What about pointers?
>
> If it only applies to arrays, maybe putting it inside the array declaration
> (e.g. as a kw arg) makes more sense?
>   
Consider this:

cdef int[:,:] matrix
cdef func(inout matrix arg):
    cdef matrix var
    ...

So I'd like to avoid putting it in the brackets.

It comes from Fortran; there's no parallell in C or C++. I think it only 
applies to arrays, and it is necesarry to avoid unecessarry copying. It 
means that the contents of the array is copied into a new, contiguous 
array when entering the function, and then back again afterwards. This 
is just an instruction about what should happen to the *contents* of the 
array, so it's not in breach of Python semantics, and specifically would 
not make sense for scalars, and I think not for pointers either.


Dag Sverre

_______________________________________________
Cython-dev mailing list
Cython-dev@codespeak.net
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Proposal: Cython array type (CEP 517)

Reply via email to