Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-06 Thread Matthew Brett
Hi,

 The idea behind the current syntax was to keep things as close as
 possible to Python/NumPy, and only provide some hints to Cython for
 optimization. My problem with this now is that a) it's too easy to get
 non-optimized code without a warning by letting in untyped indices, b) I
 think the whole thing is a bit too magic and that it is too unclear
 what is going on to newcomers (though I'm guessing there).

 My proposal: Introduce an explicit buffer syntax:

 arr = np.zeros(..)
 cdef int[:,:] buf = arr # 2D buffer

I like this proposal a lot; it seems a great deal clearer to me than
the earlier syntax; it helps me think of the new Cython thing that I
have in a different and more natural way.

Best,

Matthew
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-06 Thread Anne Archibald
2009/3/5 Francesc Alted fal...@pytables.org:
 A Thursday 05 March 2009, Francesc Alted escrigué:
 Well, I suppose that, provided that Cython could perform the for-loop
 transformation, giving support for strided arrays would be relatively
 trivial, and the performance would be similar than numexpr in this
 case.

 Mmh, perhaps not so trivial, because that implies that the stride of an
 array should be known in compilation time, and that would require a new
 qualifier when declaring the array.  Tricky...

Not necessarily. You can transform

a[1,2,3]

into

*(a.data + 1*a.strides[0] + 2*a.strides[1] + 3*a.strides[2])

without any need for static information beyond that a is
3-dimensional. This would already be valuable, though perhaps you'd
want to be able to declare that a particular dimension had stride 1 to
simplify things. You could then use this same implementation to add
automatic iteration.

Anne


 Cheers,

 --
 Francesc Alted
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread Francesc Alted
A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
 But yes, to implement that one would need to reimplement parts of
 NumPy to get it working. But because code would be generated
 specifically for the situation inline, I think it would be more like
 reimplementing Numexpr than reimplementing NumPy. I think one could
 simply invoke Numexpr as a first implementation (and make it an
 optional Cython plugin).

At first sight, having a kind of Numexpr kernel inside Cython would be 
great, but provided that you can already call Numexpr from both 
Python/Cython, I wonder which would be the advantage to do so.  As I 
see it, it would be better to have:

c = numexpr.evaluate(a + b)

in the middle of Cython code than just:

c = a + b

in the sense that the former would allow the programmer to see whether 
Numexpr is called explicitely or not.

One should not forget that Numexpr starts to be competitive only when 
expressions whose array operands+result sizes are around the CPU cache 
size or larger (unless transcental functions are used and local Numexpr 
has support for Intel VML, in which case this size can be substantially 
lower).  So, getting Numexpr (or the Cython implementation of it) 
automatically called for *every* expression should be not necessarily a 
Good Thing, IMO.

Cheers,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread Dag Sverre Seljebotn
Francesc Alted wrote:
 A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
 But yes, to implement that one would need to reimplement parts of
 NumPy to get it working. But because code would be generated
 specifically for the situation inline, I think it would be more like
 reimplementing Numexpr than reimplementing NumPy. I think one could
 simply invoke Numexpr as a first implementation (and make it an
 optional Cython plugin).
 
 At first sight, having a kind of Numexpr kernel inside Cython would be 
 great, but provided that you can already call Numexpr from both 
 Python/Cython, I wonder which would be the advantage to do so.  As I 
 see it, it would be better to have:
 
 c = numexpr.evaluate(a + b)
 
 in the middle of Cython code than just:
 
 c = a + b
 
 in the sense that the former would allow the programmer to see whether 
 Numexpr is called explicitely or not.

The former would need to invoke the parser etc., which one would *not* 
need to do when one has the Cython compilation step. When I mention 
numexpr it is simply because there's gone work in it already to optimize 
these things; that experience could hopefully be kept, while discarding 
the parser and opcode system.

I know too little about these things, but look:

Cython can relatively easily transform things like

cdef int[:,:] a = ..., b = ...
c = a + b * b

into a double for-loop with c[i,j] = a[i,j] + b[i,j] * b[i,j] at its 
core. A little more work could have it iterate the smallest dimension 
innermost dynamically (in strided mode).

If a and b are declared as contiguous arrays and restrict, I suppose 
the C compiler could do the most efficient thing in a lot of cases? 
(I.e. cdef restrict int[:,:,c] or similar)

However if one has a strided array, numexpr could still give an 
advantage over such a loop. Or?

But anyway, this is easily one year ahead of us, unless more numerical 
Cython developers show up.

-- 
Dag Sverre
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread Francesc Alted
A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
  At first sight, having a kind of Numexpr kernel inside Cython would
  be great, but provided that you can already call Numexpr from both
  Python/Cython, I wonder which would be the advantage to do so.  As
  I see it, it would be better to have:
 
  c = numexpr.evaluate(a + b)
 
  in the middle of Cython code than just:
 
  c = a + b
 
  in the sense that the former would allow the programmer to see
  whether Numexpr is called explicitely or not.

 The former would need to invoke the parser etc., which one would
 *not* need to do when one has the Cython compilation step.

Ah, yes.  That's a good point.

 When I 
 mention numexpr it is simply because there's gone work in it already
 to optimize these things; that experience could hopefully be kept,
 while discarding the parser and opcode system.

 I know too little about these things, but look:

 Cython can relatively easily transform things like

 cdef int[:,:] a = ..., b = ...
 c = a + b * b

 into a double for-loop with c[i,j] = a[i,j] + b[i,j] * b[i,j] at its
 core. A little more work could have it iterate the smallest dimension
 innermost dynamically (in strided mode).

 If a and b are declared as contiguous arrays and restrict, I
 suppose the C compiler could do the most efficient thing in a lot of
 cases? (I.e. cdef restrict int[:,:,c] or similar)

Agreed.


 However if one has a strided array, numexpr could still give an
 advantage over such a loop. Or?

Well, I suppose that, provided that Cython could perform the for-loop 
transformation, giving support for strided arrays would be relatively 
trivial, and the performance would be similar than numexpr in this 
case.

The case for unaligned arrays would a bit different, as the next trick 
is used: whenever an unaligned array is detected, a new 'copy' opcode 
is issued so that, for each data block, a copy is done in order to make 
the data aligned.  As the block sizes are chosen to fit easily in CPU's 
level-1 cache, this copy operation is done very fast and impacts rather 
little on performance.

As I see it, this would be the only situation that would be more 
complicated to implement natively in Cython because it requires 
non-trivial code for both blocking and handle opcodes.  However, for 
most of situations, my guess is that unaligned array operands do not 
appear, so perhaps the unaligned case optimization would not be so 
important for implementing it Cython.

 But anyway, this is easily one year ahead of us, unless more
 numerical Cython developers show up.

Cheers,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread Francesc Alted
A Thursday 05 March 2009, Francesc Alted escrigué:
 Well, I suppose that, provided that Cython could perform the for-loop
 transformation, giving support for strided arrays would be relatively
 trivial, and the performance would be similar than numexpr in this
 case.

Mmh, perhaps not so trivial, because that implies that the stride of an 
array should be known in compilation time, and that would require a new 
qualifier when declaring the array.  Tricky...

Cheers,

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread Sturla Molden
On 3/5/2009 8:51 AM, Dag Sverre Seljebotn wrote:

 What's your take on Blitz++? Around here when you say C++ and numerical 
 in the same sentence, Blitz++ is what they mean.

I have not looked at it for a long time (8 years or so). It is based on 
profane C++ templates that makes debugging impossible. The compiler does 
not emit meaningful diagnostic messages, and very often the compiler 
cannot tell on which line the error occurred. It was efficient for small 
arrays if loops could be completely unrolled by the template 
metaprogram. For large arrays, it produced intermediate arrays as no C++ 
compiler could do escape analysis.

 Introducing this syntax would actually mean less time to focus on real 
 usability issues like that. OTOH, if the syntax I propose is superior, 
 it's better to introduce it early in a long-term perspective.

There is not much difference between

cdef int[:,:] array

and

cdef numpy.ndarray[int, dim=2] array

except that the latter is a Python object. The only minor issue with 
that is the GIL. On the other hand, the former is not a Python object, 
which means it is not garbage collected.


S.M.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread Dag Sverre Seljebotn
Francesc Alted wrote:
 A Thursday 05 March 2009, Francesc Alted escrigué:
   
 Well, I suppose that, provided that Cython could perform the for-loop
 transformation, giving support for strided arrays would be relatively
 trivial, and the performance would be similar than numexpr in this
 case.
 

 Mmh, perhaps not so trivial, because that implies that the stride of an 
 array should be known in compilation time, and that would require a new 
 qualifier when declaring the array.  Tricky...
   
No, one could do the same thing that NumPy does (I think, never looked 
into it in detail), i.e:

decide on dimension to do innermost dynamically from strides and sizes
save the stride in that dimension for each array
for loop using n-dimensional iterator with larger per-loop overhead:
   save offsets
   for loop on the innermost dimension with lower per-loop overhead:
   component-wise operation using offsets and innermost strides

Dag Sverre
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread Dag Sverre Seljebotn
Sturla Molden wrote:
 Introducing this syntax would actually mean less time to focus on real 
 usability issues like that. OTOH, if the syntax I propose is superior, 
 it's better to introduce it early in a long-term perspective.
 

 There is not much difference between

 cdef int[:,:] array

 and

 cdef numpy.ndarray[int, dim=2] array

 except that the latter is a Python object. The only minor issue with 
 that is the GIL. On the other hand, the former is not a Python object, 
 which means it is not garbage collected.
   
As with all syntax, the difference is mostly psychological. The former 
means now I need fast access and will want to hit the metal, and will 
no longer look on my array through a NumPy object but through a buffer 
view, whether the latter is let Cython can optimize some of the NumPy 
operations.

About garbage collection, int[:,:] would always be a buffer view unto an 
underlying object which *would* be garbage collected. I.e. it is *not* 
stack-allocated memory; so when you do

cdef np.int_t[:,:] arr = np.zero((10,10), np.int)

then the memory of the array is garbage collected insofar the result of 
np.zero is. arr simply adds a reference to the underlying object (and 
slices add another reference, and so on).

Support for GIL-less programming is on the wanted-list anyway for both 
syntaxes though; Cython can now when one does something illegal and only 
let through certain uses of the variable, so both syntaxes works for this.

Dag Sverre
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread Sturla Molden
On 3/5/2009 10:11 AM, Dag Sverre Seljebotn wrote:


 Cython can relatively easily transform things like
 
 cdef int[:,:] a = ..., b = ...
 c = a + b * b

Now you are wandering far into Fortran territory...


 If a and b are declared as contiguous arrays and restrict, I suppose 
 the C compiler could do the most efficient thing in a lot of cases? 
 (I.e. cdef restrict int[:,:,c] or similar)

A Fortran compiler can compile a vectorized expression like

   a = b*c(i,:) + sin(k)

into

   do j=1,n
 a(j) = b(j)*c(i,j) + sin(k(j))
   end do

The compiler do this because Fortran has strict rules prohibiting 
aliasing, and because the instrinsic function sin is declared 
'elemental'. On the other hand, if the expression contains functions not 
declared 'elemental' or 'pure', there may be side effects, and temporary 
copies must be made. The same could happen if the expression contained 
variables declared 'pointer', in which case it could contain aliases.

I think in the case of numexpr, it assumes that NumPy ufuncs are 
elemental like Fortran intrinsics.

Matlab's JIT compiler works with the assumption that arrays are 
inherently immutable (everything has copy-on-write semantics). That 
makes life easier.


 However if one has a strided array, numexpr could still give an 
 advantage over such a loop. Or?

Fortran compilers often makes copies of strided arrays. The trick is to 
make sure the working arrays fit in cache. Again, this is safe when the 
expression only contains 'elemental' or 'pure' functions. Fortran also 
often does copy-in copy-out if a function is called with a strided 
array as argument.

S.M.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread Francesc Alted
A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
 No, one could do the same thing that NumPy does (I think, never
 looked into it in detail), i.e:

 decide on dimension to do innermost dynamically from strides and
 sizes save the stride in that dimension for each array
 for loop using n-dimensional iterator with larger per-loop overhead:
save offsets
for loop on the innermost dimension with lower per-loop overhead:
component-wise operation using offsets and innermost strides

I see.  Yes, it seems definitely doable.  However, I don't understand 
very well when you say that you have to decide on dimension to do 
innermost dynamically.  For me, this dimension should always be the 
trailing dimension, in order to maximize the locality of data.  Or I'm 
missing something?

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread Dag Sverre Seljebotn
Francesc Alted wrote:
 A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
   
 No, one could do the same thing that NumPy does (I think, never
 looked into it in detail), i.e:

 decide on dimension to do innermost dynamically from strides and
 sizes save the stride in that dimension for each array
 for loop using n-dimensional iterator with larger per-loop overhead:
save offsets
for loop on the innermost dimension with lower per-loop overhead:
component-wise operation using offsets and innermost strides
 

 I see.  Yes, it seems definitely doable.  However, I don't understand 
 very well when you say that you have to decide on dimension to do 
 innermost dynamically.  For me, this dimension should always be the 
 trailing dimension, in order to maximize the locality of data.  Or I'm 
 missing something?
   
For a transposed array (or Fortran-ordered one) it will be the leading. 
Not sure whether it is possible with other kinds of views (where e.g. a 
middle dimension varies fastest), but the NumPy model doesn't preclude 
it and I suppose it would be possible with stride_tricks.

Dag Sverre
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread Francesc Alted
A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
 Francesc Alted wrote:
  A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
  No, one could do the same thing that NumPy does (I think, never
  looked into it in detail), i.e:
 
  decide on dimension to do innermost dynamically from strides and
  sizes save the stride in that dimension for each array
  for loop using n-dimensional iterator with larger per-loop
  overhead: save offsets
 for loop on the innermost dimension with lower per-loop
  overhead: component-wise operation using offsets and innermost
  strides
 
  I see.  Yes, it seems definitely doable.  However, I don't
  understand very well when you say that you have to decide on
  dimension to do innermost dynamically.  For me, this dimension
  should always be the trailing dimension, in order to maximize the
  locality of data.  Or I'm missing something?

 For a transposed array (or Fortran-ordered one) it will be the
 leading.

Good point.  I was not aware of this subtlity.  In fact, numexpr does 
not get well with transposed views of NumPy arrays.  Filed the bug in:

http://code.google.com/p/numexpr/issues/detail?id=18

 Not sure whether it is possible with other kinds of views 
 (where e.g. a middle dimension varies fastest), but the NumPy model
 doesn't preclude it and I suppose it would be possible with
 stride_tricks.

Middle dimensions varying first?  Oh my! :)

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread Sturla Molden
 A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:

 Good point.  I was not aware of this subtlity.  In fact, numexpr does
 not get well with transposed views of NumPy arrays.  Filed the bug in:

 http://code.google.com/p/numexpr/issues/detail?id=18

 Not sure whether it is possible with other kinds of views
 (where e.g. a middle dimension varies fastest), but the NumPy model
 doesn't preclude it and I suppose it would be possible with
 stride_tricks.

 Middle dimensions varying first?  Oh my! :)


I cannot see any obvious justification for letting the middle dimension
varying first.

C ordering is natural if we work with a pointer to an array of pointers
or an array of arrays, which in both cases will be indexed as
array[i][j] in C:

   array[i][j] = (array[i])[j]
   = *(array[i]+j) = *(*(array+i)+j)

While C has arrays and pointers, the difference is almost never visible to
the programmer. This has lead some to erroneously believe that C has no
arrays, only pointers. However:

   double array[512];
   double *p = array;

Now sizeof(array) will be sizeof(double)*512, whereas sizeof(p) will be
sizeof(long). This is one of very few cases where C arrays and pointers
behave differently, but demonstrates the existence of arrays in C.

The justification for Fortran ordering is in the mathematics. Say we have
a set of linear equations

   A * X = B

and are going to solve for X, using some magical subroutine 'solve'. The
most efficient way to store these arrays becomes the Fortran ordering.
That is,

   call solve(A, X, B)

will be mathematically equivalent to the loop

   do i = i, n
  call solve(A, X(:,i), B)
   end do

All the arrays in the call to solve are still kept contigous! This would
not be the case with C ordering, which is an important reason that C sucks
so much for numerical computing. To write efficient linear algebra in C,
we have to store matrices in memory transposed to how they appear in
mathematical equations. In fact, Matlab uses Fortran ordering because of
this.

While C ordering feels natural to computer scientists, who loves the
beauty of pointer and array symmetries, it is a major obstacle for
scientists and engineers from other fields. It is perhaps the most
important reason why numerical code written in Fortran tend to be the
faster: If a matrix is rank n x m in the equation, it should be rank n x m
in the program as well, right? Not so with C. The better performance of
Fortran for numerical code is often blamed on pointer aliasing in C. I
believe wrong memory layout by the hands of the programmer is actually the
more important reason. In fact, whenever I have done comparisons, the C
compiler has always produced the faster machine code (gcc vs. gfortran or
g77, icc vs. ifort). But to avoid the pitfall, one must be aware of it.
And when a programmer's specialization is in another field, this is
usually not the case. Most scientists doing some casual C, Java or Python
programming fall straight into the trap. That is also why I personally
feel it was a bad choice to let C ordering be the default in NumPy.


S.M.






___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread David Cournapeau
Sturla Molden wrote:
 The justification for Fortran ordering is in the mathematics. Say we have
 a set of linear equations

A * X = B

 and are going to solve for X, using some magical subroutine 'solve'. The
 most efficient way to store these arrays becomes the Fortran ordering.
 That is,

call solve(A, X, B)

 will be mathematically equivalent to the loop

do i = i, n
   call solve(A, X(:,i), B)
end do

 All the arrays in the call to solve are still kept contigous! This would
 not be the case with C ordering, which is an important reason that C sucks
 so much for numerical computing. To write efficient linear algebra in C,
 we have to store matrices in memory transposed to how they appear in
 mathematical equations. In fact, Matlab uses Fortran ordering because of
 this.
   

I don't think that's true: I am pretty sure matlab follows fortran
ordering because matlab was born as a shell around BLAS, LINPACK and co.

I don't understand your argument about Row vs Column matters: which one
is best depends on your linear algebra equations. You give an example
where Fortran is better, but I can find example where C order will be
more appropriate. Most of the time, for anything non trivial, which one
is best depends on the dimensions of the problem (Kalman filtering in
high dimensional spaces for example), because some parts of the
equations are better handled in a row-order fashion, and some other
parts in a column order fashion.

I don't know whether this is true, but I have read several times that
the column order in Fortran is historical and due to some  specificities
of the early IBM - but I have of course no idea what the hardware in
1954 looks like from a programming point of view :) This does not
prevent it from being a happy accident with regard to performances, though.

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-05 Thread Francesc Alted
A Thursday 05 March 2009, David Cournapeau escrigué:
 I don't understand your argument about Row vs Column matters: which
 one is best depends on your linear algebra equations. You give an
 example where Fortran is better, but I can find example where C order
 will be more appropriate. Most of the time, for anything non trivial,
 which one is best depends on the dimensions of the problem (Kalman
 filtering in high dimensional spaces for example), because some parts
 of the equations are better handled in a row-order fashion, and some
 other parts in a column order fashion.

Yeah.  Yet another (simple) example coming from linear algebra: a matrix 
multiplied by a vector.  Given a (matrix):

a = [[0,1,2],
 [3,4,5],
 [6,7,8]]

and b (vector):

b = [[1],
 [2],
 [3]]

the most intuitive way to do the multiplication is to take the 1st row 
of a and do a dot product against b, repeating the process for 2nd and 
3rd rows of a.  C order coincides with this rule, and it is optimal 
from the point of view of memory access, while Fortran order is not.

-- 
Francesc Alted
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Cython numerical syntax revisited

2009-03-04 Thread Dag Sverre Seljebotn
This is NOT yet discussed on the Cython list; I wanted to check with 
more numerical users to see if the issue should even be brought up there.

The idea behind the current syntax was to keep things as close as 
possible to Python/NumPy, and only provide some hints to Cython for 
optimization. My problem with this now is that a) it's too easy to get 
non-optimized code without a warning by letting in untyped indices, b) I 
think the whole thing is a bit too magic and that it is too unclear 
what is going on to newcomers (though I'm guessing there).

My proposal: Introduce an explicit buffer syntax:

arr = np.zeros(..)
cdef int[:,:] buf = arr # 2D buffer

Here, buf would be something else than arr; it is a seperate view to the 
array for low-level purposes.

This has certain disadvantages; consider:

a1 = np.zeros(...) + 1; a2 = np.zeros(...) + 2
cdef int[:] b1 = a1, b2 = a2

Here, one would need to use b1 and b2 for for-loop arithmetic, but a1 
and a2 for vectorized operations and slicing. b1 + b2 would mean 
something else and not-NumPy-related (at first disallowed, but see below).

print b1 would likely coerce b1 to a Python memoryview and print 
memoryview ... (at least on newer Python versions); one would need 
to use some function to get b1 back to a NumPy array.

Advantages:
- More explicit
- Leaves a path open in the syntax for introducing low-level slicing and 
arithmetic as seperate operations in Cython independent of NumPy (think 
Numexpr compile-time folded into Cython code).
- Possible to have some creative syntax like int[0:] for disallowing 
negative wraparound and perhaps even int[-3:] for non-zero-based indexing.

More details: http://wiki.cython.org/enhancements/buffersyntax

(Of course, compatability with existing code will be taken seriously no 
matter how this plays out!)

-- 
Dag Sverre
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-04 Thread Dag Sverre Seljebotn
Stéfan van der Walt wrote:
 Hi Dag
 
 2009/3/5 Dag Sverre Seljebotn da...@student.matnat.uio.no:
 More details: http://wiki.cython.org/enhancements/buffersyntax
 
 Interesting proposal!  Am I correct in thinking that you'd have to
 re-implement a lot of NumPy yourself to get this working?  Or are you
 planning to build on NumPy + C-API?

First off, the proposal now is simply to change the syntax for existing 
features, which would simply disable arithmetic and slicing this time 
around. Slicing could perhaps happen over summer, but aritmetic would 
likely not happen for some time. The only point now is that before the 
syntax and work habit is *too* fixed, one could leave the road more open 
for it.

But yes, to implement that one would need to reimplement parts of NumPy 
to get it working. But because code would be generated specifically for 
the situation inline, I think it would be more like reimplementing 
Numexpr than reimplementing NumPy. I think one could simply invoke 
Numexpr as a first implementation (and make it an optional Cython plugin).

The fact that any performance improvements cannot be done incrementally 
and transparently though is certainly speaking against the syntax/semantics.

-- 
Dag Sverre
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-04 Thread Dag Sverre Seljebotn
Sturla Molden wrote:
 arr = np.zeros(..)
 cdef int[:,:] buf = arr # 2D buffer

 Here, buf would be something else than arr; it is a seperate view to the
 array for low-level purposes.
 
 I like your proposal. The reason we use Fortran for numerical computing is
 that Fortran makes it easy to manipulate arrays. C or C++ sucks terribly
 for anything related to numerical computing, and arrays in particular.

What's your take on Blitz++? Around here when you say C++ and numerical 
in the same sentence, Blitz++ is what they mean.

 Cython is currently not better than C. The ndarray syntax you added last
 summer is useless if we need to pass the array or a view/slice to another
 function. That is almost always the case. While the syntax is there, the

This can be fixed with the existing syntax:

http://trac.cython.org/cython_trac/ticket/177

Introducing this syntax would actually mean less time to focus on real 
usability issues like that. OTOH, if the syntax I propose is superior, 
it's better to introduce it early in a long-term perspective.

 Fortran 90/95 does this already, which is a major reason for chosing it
 for numerical computing. If you have something like this working, I
 believe many scientists would be happy to retire Fortran. It's not that
 anyone likes it that much. Anyhow, I don't see myself retiring Fortran and
 f2py any time soon.

That's certainly an interesting perspective. Requires a lot of work 
though :-)

-- 
Dag Sverre
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Cython numerical syntax revisited

2009-03-04 Thread Dag Sverre Seljebotn
Andrew Straw wrote:
 Dag Sverre Seljebotn wrote:
 This is NOT yet discussed on the Cython list; I wanted to check with 
 more numerical users to see if the issue should even be brought up there.

 The idea behind the current syntax was to keep things as close as 
 possible to Python/NumPy, and only provide some hints to Cython for 
 optimization. My problem with this now is that a) it's too easy to get 
 non-optimized code without a warning by letting in untyped indices, b) I 
 think the whole thing is a bit too magic and that it is too unclear 
 what is going on to newcomers (though I'm guessing there).
 
 These may be issues, but I think keeping cython -a my_module.pyx in 
 one's development cycle and inspecting the output will lead to great 
 enlightenment on the part of the Cython user. Perhaps this should be 
 advertised more prominently? I always do this with any Cython-generated 
 code, and it works wonders.

Well, I do so too (or rather just open the generated C file in emacs, 
but since I'm working on Cython I'm more used to read that garbage than 
others I suppose :-) ). But it feels like one extra step which must be 
done. A syntax highlighter in emacs highlighting C operations would help 
as well though.

 My proposal: Introduce an explicit buffer syntax:

 arr = np.zeros(..)
 cdef int[:,:] buf = arr # 2D buffer
 
 My initial reaction is that it seems to be a second implementation of 
 buffer interaction Cython, and therefore yet another thing to keep in 

Well, it would use much of the same implementation of course :-) It's 
more of a change in the parser and a few rules here and there than 
anything else.

 mind and it's unclear to me how different it would be from the 
 traditional Cython numpy ndarray behavior and how the behavior of the 
 two approaches might differ, perhaps in subtle ways. So that's a 
 disadvantage from my perspective. I agree that some of your ideas are 
 advantages, however. Also, it seems it would allow one to (more easily) 
 interact with buffer objects in sophisticated ways without needing the 
 GIL, which is another advantage.
 
 Could some or all of this be added to the current numpy buffer 
 implementation, or does it really need the new syntax?

The new features I mention? Most of it could be wedged into the existing 
syntax, but at the expense of making things even less clear (or at least 
I feel so) -- for instance, if we decided that Cython was to take care 
of operations like a + b in a NumExpr-like fashion, then it would mean 
that all declared buffers would get their Python versions of their 
arithmetic operators hidden and fixed.

To make a point, it would mean that the NumPy matrix object would 
suddenly get componentwise multiplication when assigned to an 
ndarray[int] variable! (Granted, my feeling is that the matrix class 
should be better avoided anyway, but...)

Regarding passing buffers to C/Fortran, it's just a matter of coming up 
with a nice syntax.

 Also, is there anything possible with buffer objects that would be 
 limited by the choice of syntax you propose? I imagine this might not 
 work with structured data types (then again, it might...).

mystruct[:,:] should just work, if that is what you mean. It's simply a 
matter of a) adding something to the parser, and b) disable the current 
pass-through of what the buffer cannot handle to the Python runtime.

-- 
Dag Sverre
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion