Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-22 Thread Dag Sverre Seljebotn
Matthew Brett wrote:
 Hi,

 The feature of compiling code for multiple types is somewhat orthogonal
 to
 ndarray support; better treat them seperately and take one at the time.

 Well, it's relevant to numpy because if you want to implement - for
 example - a numpy sort, then you've got to deal with an unspecified
 number of dimensions, and one of a range of specified types, otherwise
 you will end up copying, casting, rejecting types and so on...

Thanks for the feedback.

Sure; I didn't mean to imply that it was irrelevant, indeed I think it
would be very useful to NumPy users. I just mean to say that it is
orthogonal -- it is also important, but should be treated as a seperate
feature that is independent of NumPy support as such. To give you a flavor
of why this is a nontrivial issue, consider a function with four different
numpy arrays. Then you would need to create approx. 15**4 different
versions of it, one for each combination of types -- not feasible at all.
I.e., you need some way of specifying that these two arrays have the same
datatype, and so on, i.e. some kind of generalized/template programming.
I'd rather focus on the easy case for now; not precluding more features
later.

When it comes to not knowing the number of dimensions, the right way to go
about that would be to support NumPy dimension-neutral array iterators in
a nice way. I have  a lot of thoughts about that too, but it's lower on
the priority list and I'll wait with discussing it until the more direct
case is solved (as that is the one new NumPy/Cython users will miss most).

On the negative indices side; I really like the unsigned int idea. The
problem with range is that it can, at least in theory, grow larger than
MAX_INT, and so we are cautious about automatically inferring that the
iterator variable should be unsigned int (Python ints can be arbitrarily
large). Of course, one could compile two versions of each loop and have an
if-statement...again, I'll probably specifically drop the negative test
for explicitly declared unsigned int for now, while having range imply
unsigned int will be dealt with together with more general type inference
which Cython developers are also thinking about (though whether we'll have
the developer resources for it is another issue).

Dag Svere




___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Dag Sverre Seljebotn
Dag wrote:
 General feedback is welcome; in particular, I need more opinions about
 what syntax people would like. We seem unable to find something that we
 really like; this is the current best candidate (cdef is the way you
 declare types on variables in Cython):

 cdef int i = 4, j = 6
 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10), dtype=np.float64)
 arr[i, j] = 1
 ...


Some more important points:
- There will likely be a compiler flag (and perhaps per-block or
per-variable pragmas) on whether bounds-checking is done or not)
- It doesn't look like negative indices will be supported in this mode, as
it adds another branch in potentially very tight loops. Opinions on this
are welcome though. In safe mode, bounds checking will catch it, while in
unsafe mode it will at best segfault and at worst corrupt data.

The negative indices thing is potentially confusing to new users. Pex uses
a different syntax (arr{i, j} for efficient array lookups) partly to make
this fact very explicit. Thoughts?

Dag Sverre

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Joris De Ridder
Hi Dag,

 General feedback is welcome; in particular, I need more opinions about
 what syntax people would like. We seem unable to find something that  
 we
 really like; this is the current best candidate (cdef is the way you
 declare types on variables in Cython):

 cdef int i = 4, j = 6
 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10),  
 dtype=np.float64)
 arr[i, j] = 1
 ...

 The code above the under the hood acquires a buffer and uses it for
 efficient access. arr[5], arr[5,6,7], arr[2:4,...] and in general  
 anything
 but two simple indices will be passed to Python, while arr[i, j]  
 will be
 passed to the buffer.

The syntax looks quite good, I think.
Some questions though:
- I guess there are some technical reasons why np.float64 in your  
example has to be repeated twice?
- When you say negative indices are not supported you mean that they  
are passed to python, or
   won't they work at all?

I'm looking forward to the results of your Cython project!

Cheers,
Joris


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Gael Varoquaux
On Sat, Jun 21, 2008 at 08:59:25AM +0200, Dag Sverre Seljebotn wrote:
 The negative indices thing is potentially confusing to new users. Pex uses
 a different syntax (arr{i, j} for efficient array lookups) partly to make
 this fact very explicit. Thoughts?

I don't like the different syntax. I would really be happy if Cython code
could stay as close as possible to python code. In particular we want to
make it as easy as possible to prototype code in Python and move it to
Cython.

My 2 cents,

Gaël

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Matthew Brett
Hi,

Thanks a lot for the email - it's an exciting project.

 cdef int i = 4, j = 6
 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10), dtype=np.float64)
 arr[i, j] = 1

I'm afraid I'm going to pitch into an area I'm ignorant of, and I'm
sorry for that, but do you think there is any way of allowing fast
indexing and operations on arrays that do not have their type declared
at compile time?

Cheers,

Matthew
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Dag Sverre Seljebotn
Matthew Brett wrote:
 Hi,

 Thanks a lot for the email - it's an exciting project.

 cdef int i = 4, j = 6
 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10),
 dtype=np.float64)
 arr[i, j] = 1

 I'm afraid I'm going to pitch into an area I'm ignorant of, and I'm
 sorry for that, but do you think there is any way of allowing fast
 indexing and operations on arrays that do not have their type declared
 at compile time?

Well... the Cython compiler definitely needs to know about which type it
is -- the array simply has a byte buffer, and one needs to know how to
interpret that (how many bytes per element etc.).

However, I could make it so that if you left out the type, it would be
auto-detected. I.e.:

cdef np.ndarray[2] arr = ...
cdef np.float64 x = arr[3,4]

However, if you then did something like this on a 32-bit-system:

cdef np.ndarray[2] arr = ...
print arr[3,4]

Then it would assume arr is an object, and if arr is really float64 you
would segfault.

So the real answer is: Yes, the type is needed.

Dag Sverre

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Fernando Perez
On Fri, Jun 20, 2008 at 11:50 PM, Dag Sverre Seljebotn
[EMAIL PROTECTED] wrote:
 Since there's been a lot of Cython discussion lately I thought I'd speak
 up and start a thread specifically for my project.

Thanks for coming over for the discussion!

 The code above the under the hood acquires a buffer and uses it for
 efficient access. arr[5], arr[5,6,7], arr[2:4,...] and in general anything
 but two simple indices will be passed to Python, while arr[i, j] will be
 passed to the buffer.

Just so I understand correctly: do you mean that in general only 2
indices are supported 'natively', or that for an n-dim array, only
*exactly n* indices are supported natively and other approaches are
delegated to pure python?

Cheers,

f
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Dag Sverre Seljebotn
Joris De Ridder wrote:
 Hi Dag,

 General feedback is welcome; in particular, I need more opinions about
 what syntax people would like. We seem unable to find something that
 we
 really like; this is the current best candidate (cdef is the way you
 declare types on variables in Cython):

 cdef int i = 4, j = 6
 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10),
 dtype=np.float64)
 arr[i, j] = 1
 ...

 The code above the under the hood acquires a buffer and uses it for
 efficient access. arr[5], arr[5,6,7], arr[2:4,...] and in general
 anything
 but two simple indices will be passed to Python, while arr[i, j]
 will be
 passed to the buffer.

 The syntax looks quite good, I think.
 Some questions though:
 - I guess there are some technical reasons why np.float64 in your
 example has to be repeated twice?

There are all sorts of reasons. The right hand side is disconnected from
the left hand side. It could just as well say

cdef np.ndarray[float64, 2] arr = my_func()
or
cdef np.ndarray[float64, 2] arr = x
or
cdef np.ndarray[float64, 2] arr = np.zeros((3, 3),
dtype=np.int).astype(np.float64)

(If you assign a Python object that is not an ndarray of right dimensions
and type, a runtime exception is raised.)

So: If you can figure out a nice rule for not having to repeat the type
I'm all ears, but, really, the repeating of the type was only incidental.

 - When you say negative indices are not supported you mean that they
 are passed to python, or
won't they work at all?

In unsafe mode: If you are really lucky they will segfault your
application, however usually you will not be that fortunate and they will
corrupt your data instead.

In safe mode, they will raise an IndexError.

(Passing them to Python would require to know that they are negative in
the first place. And knowing that they are negative would require an extra
if-test in tight, tight inner loops.)

(Well, of course, if you actually do arr[-3, -2] I suppose we can raise
a compiler error, the point is if the negative values are stored in
variables -- how do you know if they are negative without checking?)

Dag Sverre

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Dag Sverre Seljebotn
Fernando Perez wrote:
 On Fri, Jun 20, 2008 at 11:50 PM, Dag Sverre Seljebotn
 [EMAIL PROTECTED] wrote:
 Since there's been a lot of Cython discussion lately I thought I'd speak
 up and start a thread specifically for my project.

 Thanks for coming over for the discussion!

 The code above the under the hood acquires a buffer and uses it for
 efficient access. arr[5], arr[5,6,7], arr[2:4,...] and in general
 anything
 but two simple indices will be passed to Python, while arr[i, j] will be
 passed to the buffer.

 Just so I understand correctly: do you mean that in general only 2
 indices are supported 'natively', or that for an n-dim array, only
 *exactly n* indices are supported natively and other approaches are
 delegated to pure python?

That is the idea. At least for NumPy, all the other cases will generate
new arrays, so they tend to be a different kind of operations. One can
look into improving slicing efficiency etc. later.

Dag Sverre

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Stéfan van der Walt
Hi Dag

2008/6/21 Dag Sverre Seljebotn [EMAIL PROTECTED]:
 However, I could make it so that if you left out the type, it would be
 auto-detected. I.e.:

 cdef np.ndarray[2] arr = ...
 cdef np.float64 x = arr[3,4]

Would it not be possible for Cython to make the necessary C-API calls
to query the dimensions and type of the ndarray?  I don't really know
the full background here, so sorry if my comment is completely
off-base.

Eventually, I'd love for the Cython code to look very similar to
Python code.  Even if that is not technically feasible at the moment,
it should be the eventual target to have as little markup as
possible.

Regards
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Anne Archibald
2008/6/21 Dag Sverre Seljebotn [EMAIL PROTECTED]:
 Dag wrote:
 General feedback is welcome; in particular, I need more opinions about
 what syntax people would like. We seem unable to find something that we
 really like; this is the current best candidate (cdef is the way you
 declare types on variables in Cython):

 cdef int i = 4, j = 6
 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10), dtype=np.float64)
 arr[i, j] = 1
 ...


 Some more important points:
 - There will likely be a compiler flag (and perhaps per-block or
 per-variable pragmas) on whether bounds-checking is done or not)
 - It doesn't look like negative indices will be supported in this mode, as
 it adds another branch in potentially very tight loops. Opinions on this
 are welcome though. In safe mode, bounds checking will catch it, while in
 unsafe mode it will at best segfault and at worst corrupt data.

 The negative indices thing is potentially confusing to new users. Pex uses
 a different syntax (arr{i, j} for efficient array lookups) partly to make
 this fact very explicit. Thoughts?

I am very worried about the negative numbers issue. It's the sort of
thing that will readily lead to errors, and that produces a
significant difference between cython and python. I understand the
performance issues that motivate it, but cython really needs to be
easy to use or we might as well just use C. As I understand it, the
ultimate goal should be for users to be able to compile arbitrary
python code for tiny speed improvements, then add type annotations for
key variables to obtain large speed improvements. Forbidding negative
indices now prevents that goal.

My suggestion is this: allow negative indices, accepting the cost in
tight loops. (If bounds checking is enabled, the cost will be
negligible anyway.) Provide a #pragma allowing the user to assert that
a certain piece of code uses no negative indices. Ultimately, the
cython compiler will often be able to deduce than negative indices
will not be used within a particular loop and supply the pragma
itself, but in the meantime this solution allows people to use the
extremely-convenient negative indices without causing mysterious
problems, and it sets things up for the future. (Other compiler-based
cleverness can deal with the most common case by converting explicit
negative indices.)



Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Dag Sverre Seljebotn
Anne Archibald wrote:
 2008/6/21 Dag Sverre Seljebotn [EMAIL PROTECTED]:
 Dag wrote:
 General feedback is welcome; in particular, I need more opinions about
 what syntax people would like. We seem unable to find something that we
 really like; this is the current best candidate (cdef is the way you
 declare types on variables in Cython):

 cdef int i = 4, j = 6
 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10),
 dtype=np.float64)
 arr[i, j] = 1
 ...


 Some more important points:
 - There will likely be a compiler flag (and perhaps per-block or
 per-variable pragmas) on whether bounds-checking is done or not)
 - It doesn't look like negative indices will be supported in this mode,
 as
 it adds another branch in potentially very tight loops. Opinions on this
 are welcome though. In safe mode, bounds checking will catch it, while
 in
 unsafe mode it will at best segfault and at worst corrupt data.

 The negative indices thing is potentially confusing to new users. Pex
 uses
 a different syntax (arr{i, j} for efficient array lookups) partly to
 make
 this fact very explicit. Thoughts?

 I am very worried about the negative numbers issue. It's the sort of
 thing that will readily lead to errors, and that produces a
 significant difference between cython and python. I understand the
 performance issues that motivate it, but cython really needs to be
 easy to use or we might as well just use C. As I understand it, the
 ultimate goal should be for users to be able to compile arbitrary
 python code for tiny speed improvements, then add type annotations for
 key variables to obtain large speed improvements. Forbidding negative
 indices now prevents that goal.

 My suggestion is this: allow negative indices, accepting the cost in
 tight loops. (If bounds checking is enabled, the cost will be
 negligible anyway.) Provide a #pragma allowing the user to assert that
 a certain piece of code uses no negative indices. Ultimately, the
 cython compiler will often be able to deduce than negative indices
 will not be used within a particular loop and supply the pragma
 itself, but in the meantime this solution allows people to use the
 extremely-convenient negative indices without causing mysterious
 problems, and it sets things up for the future. (Other compiler-based
 cleverness can deal with the most common case by converting explicit
 negative indices.)

Thank you for supplying such useful feedback! It is likely that you have
convinced me.

Dag Sverre

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Rahul Garg
 I am very worried about the negative numbers issue. It's the sort of
 thing that will readily lead to errors, and that produces a
 significant difference between cython and python. I understand the
 performance issues that motivate it, but cython really needs to be
 easy to use or we might as well just use C. As I understand it, the
 ultimate goal should be for users to be able to compile arbitrary
 python code for tiny speed improvements, then add type annotations for
 key variables to obtain large speed improvements. Forbidding negative
 indices now prevents that goal.

Since this also applies to my compiler unPython, I am adding this note.
In unPython, currently negative indices are not supported but I will  
support them in the next release. There will be a global compiler  
command line switch that will turn-on or turn-off the negative  
indices. Compiler will also try to infer if the indices are positive  
in a given loop.

thanks,
rahul

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Matthew Brett
Hi,

 The feature of compiling code for multiple types is somewhat orthogonal to
 ndarray support; better treat them seperately and take one at the time.

Well, it's relevant to numpy because if you want to implement - for
example - a numpy sort, then you've got to deal with an unspecified
number of dimensions, and one of a range of specified types, otherwise
you will end up copying, casting, rejecting types and so on...

Best,

Matthew
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Robert Kern
On Sat, Jun 21, 2008 at 17:08, Anne Archibald [EMAIL PROTECTED] wrote:
 My suggestion is this: allow negative indices, accepting the cost in
 tight loops. (If bounds checking is enabled, the cost will be
 negligible anyway.) Provide a #pragma allowing the user to assert that
 a certain piece of code uses no negative indices.

Instead of a #pragma, you could rely on the type of the index. If it
is unsigned, you can do the fast path; if it is signed, you need to
check for and handle potential negatives.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Anne Archibald
2008/6/21 Robert Kern [EMAIL PROTECTED]:
 On Sat, Jun 21, 2008 at 17:08, Anne Archibald [EMAIL PROTECTED] wrote:
 My suggestion is this: allow negative indices, accepting the cost in
 tight loops. (If bounds checking is enabled, the cost will be
 negligible anyway.) Provide a #pragma allowing the user to assert that
 a certain piece of code uses no negative indices.

 Instead of a #pragma, you could rely on the type of the index. If it
 is unsigned, you can do the fast path; if it is signed, you need to
 check for and handle potential negatives.

Cute! And then it's easy to make for i in range(n) produce unsigned
results with no effort on the part of the user.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On my Cython/NumPy project

2008-06-21 Thread Robert Kern
On Sat, Jun 21, 2008 at 18:45, Anne Archibald [EMAIL PROTECTED] wrote:
 2008/6/21 Robert Kern [EMAIL PROTECTED]:
 On Sat, Jun 21, 2008 at 17:08, Anne Archibald [EMAIL PROTECTED] wrote:
 My suggestion is this: allow negative indices, accepting the cost in
 tight loops. (If bounds checking is enabled, the cost will be
 negligible anyway.) Provide a #pragma allowing the user to assert that
 a certain piece of code uses no negative indices.

 Instead of a #pragma, you could rely on the type of the index. If it
 is unsigned, you can do the fast path; if it is signed, you need to
 check for and handle potential negatives.

 Cute! And then it's easy to make for i in range(n) produce unsigned
 results with no effort on the part of the user.

Not entirely. You still need to declare cdef unsigned int i. Cython
does not translate the for loop into fast C unless if the variable has
been declared.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion