Re: [Numpy-discussion] On my Cython/NumPy project
Matthew Brett wrote: Hi, The feature of compiling code for multiple types is somewhat orthogonal to ndarray support; better treat them seperately and take one at the time. Well, it's relevant to numpy because if you want to implement - for example - a numpy sort, then you've got to deal with an unspecified number of dimensions, and one of a range of specified types, otherwise you will end up copying, casting, rejecting types and so on... Thanks for the feedback. Sure; I didn't mean to imply that it was irrelevant, indeed I think it would be very useful to NumPy users. I just mean to say that it is orthogonal -- it is also important, but should be treated as a seperate feature that is independent of NumPy support as such. To give you a flavor of why this is a nontrivial issue, consider a function with four different numpy arrays. Then you would need to create approx. 15**4 different versions of it, one for each combination of types -- not feasible at all. I.e., you need some way of specifying that these two arrays have the same datatype, and so on, i.e. some kind of generalized/template programming. I'd rather focus on the easy case for now; not precluding more features later. When it comes to not knowing the number of dimensions, the right way to go about that would be to support NumPy dimension-neutral array iterators in a nice way. I have a lot of thoughts about that too, but it's lower on the priority list and I'll wait with discussing it until the more direct case is solved (as that is the one new NumPy/Cython users will miss most). On the negative indices side; I really like the unsigned int idea. The problem with range is that it can, at least in theory, grow larger than MAX_INT, and so we are cautious about automatically inferring that the iterator variable should be unsigned int (Python ints can be arbitrarily large). Of course, one could compile two versions of each loop and have an if-statement...again, I'll probably specifically drop the negative test for explicitly declared unsigned int for now, while having range imply unsigned int will be dealt with together with more general type inference which Cython developers are also thinking about (though whether we'll have the developer resources for it is another issue). Dag Svere ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
Dag wrote: General feedback is welcome; in particular, I need more opinions about what syntax people would like. We seem unable to find something that we really like; this is the current best candidate (cdef is the way you declare types on variables in Cython): cdef int i = 4, j = 6 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10), dtype=np.float64) arr[i, j] = 1 ... Some more important points: - There will likely be a compiler flag (and perhaps per-block or per-variable pragmas) on whether bounds-checking is done or not) - It doesn't look like negative indices will be supported in this mode, as it adds another branch in potentially very tight loops. Opinions on this are welcome though. In safe mode, bounds checking will catch it, while in unsafe mode it will at best segfault and at worst corrupt data. The negative indices thing is potentially confusing to new users. Pex uses a different syntax (arr{i, j} for efficient array lookups) partly to make this fact very explicit. Thoughts? Dag Sverre ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
Hi Dag, General feedback is welcome; in particular, I need more opinions about what syntax people would like. We seem unable to find something that we really like; this is the current best candidate (cdef is the way you declare types on variables in Cython): cdef int i = 4, j = 6 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10), dtype=np.float64) arr[i, j] = 1 ... The code above the under the hood acquires a buffer and uses it for efficient access. arr[5], arr[5,6,7], arr[2:4,...] and in general anything but two simple indices will be passed to Python, while arr[i, j] will be passed to the buffer. The syntax looks quite good, I think. Some questions though: - I guess there are some technical reasons why np.float64 in your example has to be repeated twice? - When you say negative indices are not supported you mean that they are passed to python, or won't they work at all? I'm looking forward to the results of your Cython project! Cheers, Joris Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
On Sat, Jun 21, 2008 at 08:59:25AM +0200, Dag Sverre Seljebotn wrote: The negative indices thing is potentially confusing to new users. Pex uses a different syntax (arr{i, j} for efficient array lookups) partly to make this fact very explicit. Thoughts? I don't like the different syntax. I would really be happy if Cython code could stay as close as possible to python code. In particular we want to make it as easy as possible to prototype code in Python and move it to Cython. My 2 cents, Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
Hi, Thanks a lot for the email - it's an exciting project. cdef int i = 4, j = 6 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10), dtype=np.float64) arr[i, j] = 1 I'm afraid I'm going to pitch into an area I'm ignorant of, and I'm sorry for that, but do you think there is any way of allowing fast indexing and operations on arrays that do not have their type declared at compile time? Cheers, Matthew ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
Matthew Brett wrote: Hi, Thanks a lot for the email - it's an exciting project. cdef int i = 4, j = 6 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10), dtype=np.float64) arr[i, j] = 1 I'm afraid I'm going to pitch into an area I'm ignorant of, and I'm sorry for that, but do you think there is any way of allowing fast indexing and operations on arrays that do not have their type declared at compile time? Well... the Cython compiler definitely needs to know about which type it is -- the array simply has a byte buffer, and one needs to know how to interpret that (how many bytes per element etc.). However, I could make it so that if you left out the type, it would be auto-detected. I.e.: cdef np.ndarray[2] arr = ... cdef np.float64 x = arr[3,4] However, if you then did something like this on a 32-bit-system: cdef np.ndarray[2] arr = ... print arr[3,4] Then it would assume arr is an object, and if arr is really float64 you would segfault. So the real answer is: Yes, the type is needed. Dag Sverre ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
On Fri, Jun 20, 2008 at 11:50 PM, Dag Sverre Seljebotn [EMAIL PROTECTED] wrote: Since there's been a lot of Cython discussion lately I thought I'd speak up and start a thread specifically for my project. Thanks for coming over for the discussion! The code above the under the hood acquires a buffer and uses it for efficient access. arr[5], arr[5,6,7], arr[2:4,...] and in general anything but two simple indices will be passed to Python, while arr[i, j] will be passed to the buffer. Just so I understand correctly: do you mean that in general only 2 indices are supported 'natively', or that for an n-dim array, only *exactly n* indices are supported natively and other approaches are delegated to pure python? Cheers, f ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
Joris De Ridder wrote: Hi Dag, General feedback is welcome; in particular, I need more opinions about what syntax people would like. We seem unable to find something that we really like; this is the current best candidate (cdef is the way you declare types on variables in Cython): cdef int i = 4, j = 6 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10), dtype=np.float64) arr[i, j] = 1 ... The code above the under the hood acquires a buffer and uses it for efficient access. arr[5], arr[5,6,7], arr[2:4,...] and in general anything but two simple indices will be passed to Python, while arr[i, j] will be passed to the buffer. The syntax looks quite good, I think. Some questions though: - I guess there are some technical reasons why np.float64 in your example has to be repeated twice? There are all sorts of reasons. The right hand side is disconnected from the left hand side. It could just as well say cdef np.ndarray[float64, 2] arr = my_func() or cdef np.ndarray[float64, 2] arr = x or cdef np.ndarray[float64, 2] arr = np.zeros((3, 3), dtype=np.int).astype(np.float64) (If you assign a Python object that is not an ndarray of right dimensions and type, a runtime exception is raised.) So: If you can figure out a nice rule for not having to repeat the type I'm all ears, but, really, the repeating of the type was only incidental. - When you say negative indices are not supported you mean that they are passed to python, or won't they work at all? In unsafe mode: If you are really lucky they will segfault your application, however usually you will not be that fortunate and they will corrupt your data instead. In safe mode, they will raise an IndexError. (Passing them to Python would require to know that they are negative in the first place. And knowing that they are negative would require an extra if-test in tight, tight inner loops.) (Well, of course, if you actually do arr[-3, -2] I suppose we can raise a compiler error, the point is if the negative values are stored in variables -- how do you know if they are negative without checking?) Dag Sverre ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
Fernando Perez wrote: On Fri, Jun 20, 2008 at 11:50 PM, Dag Sverre Seljebotn [EMAIL PROTECTED] wrote: Since there's been a lot of Cython discussion lately I thought I'd speak up and start a thread specifically for my project. Thanks for coming over for the discussion! The code above the under the hood acquires a buffer and uses it for efficient access. arr[5], arr[5,6,7], arr[2:4,...] and in general anything but two simple indices will be passed to Python, while arr[i, j] will be passed to the buffer. Just so I understand correctly: do you mean that in general only 2 indices are supported 'natively', or that for an n-dim array, only *exactly n* indices are supported natively and other approaches are delegated to pure python? That is the idea. At least for NumPy, all the other cases will generate new arrays, so they tend to be a different kind of operations. One can look into improving slicing efficiency etc. later. Dag Sverre ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
Hi Dag 2008/6/21 Dag Sverre Seljebotn [EMAIL PROTECTED]: However, I could make it so that if you left out the type, it would be auto-detected. I.e.: cdef np.ndarray[2] arr = ... cdef np.float64 x = arr[3,4] Would it not be possible for Cython to make the necessary C-API calls to query the dimensions and type of the ndarray? I don't really know the full background here, so sorry if my comment is completely off-base. Eventually, I'd love for the Cython code to look very similar to Python code. Even if that is not technically feasible at the moment, it should be the eventual target to have as little markup as possible. Regards Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
2008/6/21 Dag Sverre Seljebotn [EMAIL PROTECTED]: Dag wrote: General feedback is welcome; in particular, I need more opinions about what syntax people would like. We seem unable to find something that we really like; this is the current best candidate (cdef is the way you declare types on variables in Cython): cdef int i = 4, j = 6 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10), dtype=np.float64) arr[i, j] = 1 ... Some more important points: - There will likely be a compiler flag (and perhaps per-block or per-variable pragmas) on whether bounds-checking is done or not) - It doesn't look like negative indices will be supported in this mode, as it adds another branch in potentially very tight loops. Opinions on this are welcome though. In safe mode, bounds checking will catch it, while in unsafe mode it will at best segfault and at worst corrupt data. The negative indices thing is potentially confusing to new users. Pex uses a different syntax (arr{i, j} for efficient array lookups) partly to make this fact very explicit. Thoughts? I am very worried about the negative numbers issue. It's the sort of thing that will readily lead to errors, and that produces a significant difference between cython and python. I understand the performance issues that motivate it, but cython really needs to be easy to use or we might as well just use C. As I understand it, the ultimate goal should be for users to be able to compile arbitrary python code for tiny speed improvements, then add type annotations for key variables to obtain large speed improvements. Forbidding negative indices now prevents that goal. My suggestion is this: allow negative indices, accepting the cost in tight loops. (If bounds checking is enabled, the cost will be negligible anyway.) Provide a #pragma allowing the user to assert that a certain piece of code uses no negative indices. Ultimately, the cython compiler will often be able to deduce than negative indices will not be used within a particular loop and supply the pragma itself, but in the meantime this solution allows people to use the extremely-convenient negative indices without causing mysterious problems, and it sets things up for the future. (Other compiler-based cleverness can deal with the most common case by converting explicit negative indices.) Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
Anne Archibald wrote: 2008/6/21 Dag Sverre Seljebotn [EMAIL PROTECTED]: Dag wrote: General feedback is welcome; in particular, I need more opinions about what syntax people would like. We seem unable to find something that we really like; this is the current best candidate (cdef is the way you declare types on variables in Cython): cdef int i = 4, j = 6 cdef np.ndarray[np.float64, 2] arr = np.zeros((10, 10), dtype=np.float64) arr[i, j] = 1 ... Some more important points: - There will likely be a compiler flag (and perhaps per-block or per-variable pragmas) on whether bounds-checking is done or not) - It doesn't look like negative indices will be supported in this mode, as it adds another branch in potentially very tight loops. Opinions on this are welcome though. In safe mode, bounds checking will catch it, while in unsafe mode it will at best segfault and at worst corrupt data. The negative indices thing is potentially confusing to new users. Pex uses a different syntax (arr{i, j} for efficient array lookups) partly to make this fact very explicit. Thoughts? I am very worried about the negative numbers issue. It's the sort of thing that will readily lead to errors, and that produces a significant difference between cython and python. I understand the performance issues that motivate it, but cython really needs to be easy to use or we might as well just use C. As I understand it, the ultimate goal should be for users to be able to compile arbitrary python code for tiny speed improvements, then add type annotations for key variables to obtain large speed improvements. Forbidding negative indices now prevents that goal. My suggestion is this: allow negative indices, accepting the cost in tight loops. (If bounds checking is enabled, the cost will be negligible anyway.) Provide a #pragma allowing the user to assert that a certain piece of code uses no negative indices. Ultimately, the cython compiler will often be able to deduce than negative indices will not be used within a particular loop and supply the pragma itself, but in the meantime this solution allows people to use the extremely-convenient negative indices without causing mysterious problems, and it sets things up for the future. (Other compiler-based cleverness can deal with the most common case by converting explicit negative indices.) Thank you for supplying such useful feedback! It is likely that you have convinced me. Dag Sverre ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
I am very worried about the negative numbers issue. It's the sort of thing that will readily lead to errors, and that produces a significant difference between cython and python. I understand the performance issues that motivate it, but cython really needs to be easy to use or we might as well just use C. As I understand it, the ultimate goal should be for users to be able to compile arbitrary python code for tiny speed improvements, then add type annotations for key variables to obtain large speed improvements. Forbidding negative indices now prevents that goal. Since this also applies to my compiler unPython, I am adding this note. In unPython, currently negative indices are not supported but I will support them in the next release. There will be a global compiler command line switch that will turn-on or turn-off the negative indices. Compiler will also try to infer if the indices are positive in a given loop. thanks, rahul ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
Hi, The feature of compiling code for multiple types is somewhat orthogonal to ndarray support; better treat them seperately and take one at the time. Well, it's relevant to numpy because if you want to implement - for example - a numpy sort, then you've got to deal with an unspecified number of dimensions, and one of a range of specified types, otherwise you will end up copying, casting, rejecting types and so on... Best, Matthew ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
On Sat, Jun 21, 2008 at 17:08, Anne Archibald [EMAIL PROTECTED] wrote: My suggestion is this: allow negative indices, accepting the cost in tight loops. (If bounds checking is enabled, the cost will be negligible anyway.) Provide a #pragma allowing the user to assert that a certain piece of code uses no negative indices. Instead of a #pragma, you could rely on the type of the index. If it is unsigned, you can do the fast path; if it is signed, you need to check for and handle potential negatives. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
2008/6/21 Robert Kern [EMAIL PROTECTED]: On Sat, Jun 21, 2008 at 17:08, Anne Archibald [EMAIL PROTECTED] wrote: My suggestion is this: allow negative indices, accepting the cost in tight loops. (If bounds checking is enabled, the cost will be negligible anyway.) Provide a #pragma allowing the user to assert that a certain piece of code uses no negative indices. Instead of a #pragma, you could rely on the type of the index. If it is unsigned, you can do the fast path; if it is signed, you need to check for and handle potential negatives. Cute! And then it's easy to make for i in range(n) produce unsigned results with no effort on the part of the user. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On my Cython/NumPy project
On Sat, Jun 21, 2008 at 18:45, Anne Archibald [EMAIL PROTECTED] wrote: 2008/6/21 Robert Kern [EMAIL PROTECTED]: On Sat, Jun 21, 2008 at 17:08, Anne Archibald [EMAIL PROTECTED] wrote: My suggestion is this: allow negative indices, accepting the cost in tight loops. (If bounds checking is enabled, the cost will be negligible anyway.) Provide a #pragma allowing the user to assert that a certain piece of code uses no negative indices. Instead of a #pragma, you could rely on the type of the index. If it is unsigned, you can do the fast path; if it is signed, you need to check for and handle potential negatives. Cute! And then it's easy to make for i in range(n) produce unsigned results with no effort on the part of the user. Not entirely. You still need to declare cdef unsigned int i. Cython does not translate the for loop into fast C unless if the variable has been declared. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion