Re: [Numpy-discussion] Failure to build numpy 1.6.1

2011-11-08 Thread David Cournapeau
On Tue, Nov 8, 2011 at 9:01 AM, David Cournapeau courn...@gmail.com wrote:
 Hi Mads,

 On Tue, Nov 8, 2011 at 8:40 AM, Mads Ipsen madsip...@gmail.com wrote:
 Hi,

 I am trying to build numpy-1.6.1 with the following gcc compiler specs:

 Reading specs from /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs
 Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
 --infodir=/usr/share/info --enable-shared --enable-threads=posix
 --disable-checking --with-system-zlib --enable-__cxa_atexit
 --disable-libunwind-exceptions --enable-java-awt=gtk
 --host=x86_64-redhat-linux
 Thread model: posix
 gcc version 3.4.6 20060404 (Red Hat 3.4.6-11)

 I get the following error (any clues at what goes wrong)?

 This looks like a compiler bug (gcc 3.4 is really old). einsum uses
 SSE intrinsics, and old gcc implementations are quite buggy in that
 area.

 Could you try the following, at line 38, to add the following:

 #define EINSUM_USE_SSE1 0
 #define EINSUM_USE_SSE2 0

I meant to add this in the file
numpy/core/src/multiarray/einsum.c.src, and then rebuild numpy

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Failure to build numpy 1.6.1

2011-11-08 Thread David Cournapeau
On Tue, Nov 8, 2011 at 9:20 AM, Mads Ipsen madsip...@gmail.com wrote:

 Yup, that fixes it. For now, we can apply a temporary fix on our build
 system. Is this something that'll go into, say, 1.6.2?

That's more of a workaround than a fix. We need to decide whether we
disable intrinsics altogether or wether we want to drop support for
old compilers.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Memory hungry reduce ops in Numpy

2011-11-14 Thread David Cournapeau
On Mon, Nov 14, 2011 at 12:46 PM, Andreas Müller
amuel...@ais.uni-bonn.de wrote:
 Hi everybody.
 When I did some normalization using numpy, I noticed that numpy.std uses
 more ram than I was expecting.
 A quick google search gave me this:
 http://luispedro.org/software/ncreduce
 The site claims that std and other reduce operations are implemented
 naively with many temporaries.
 Is that true? And if so, is there a particular reason for that?
 This issues seems quite easy to fix.
 In particular the link I gave above provides code.

The code provided only implements a few special cases: being more
efficient in those cases only is indeed easy.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd-looking long double on windows 32 bit

2011-11-14 Thread David Cournapeau
On Mon, Nov 14, 2011 at 9:01 PM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Sun, Nov 13, 2011 at 5:03 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:


 On Sun, Nov 13, 2011 at 3:56 PM, Matthew Brett matthew.br...@gmail.com
 wrote:

 Hi,

 On Sun, Nov 13, 2011 at 1:34 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Sun, Nov 13, 2011 at 2:25 PM, Matthew Brett matthew.br...@gmail.com
  wrote:
 
  Hi,
 
  On Sun, Nov 13, 2011 at 8:21 AM, Charles R Harris
  charlesr.har...@gmail.com wrote:
  
  
   On Sun, Nov 13, 2011 at 12:57 AM, Matthew Brett
   matthew.br...@gmail.com
   wrote:
  
   Hi,
  
   On Sat, Nov 12, 2011 at 11:35 PM, Matthew Brett
   matthew.br...@gmail.com
   wrote:
Hi,
   
Sorry for my continued confusion here.  This is numpy 1.6.1 on
windows
XP 32 bit.
   
In [2]: np.finfo(np.float96).nmant
Out[2]: 52
   
In [3]: np.finfo(np.float96).nexp
Out[3]: 15
   
In [4]: np.finfo(np.float64).nmant
Out[4]: 52
   
In [5]: np.finfo(np.float64).nexp
Out[5]: 11
   
If there are 52 bits of precision, 2**53+1 should not be
representable, and sure enough:
   
In [6]: np.float96(2**53)+1
Out[6]: 9007199254740992.0
   
In [7]: np.float64(2**53)+1
Out[7]: 9007199254740992.0
   
If the nexp is right, the max should be higher for the float96
type:
   
In [9]: np.finfo(np.float64).max
Out[9]: 1.7976931348623157e+308
   
In [10]: np.finfo(np.float96).max
Out[10]: 1.#INF
   
I see that long double in C is 12 bytes wide, and double is the
usual
8
bytes.
  
   Sorry - sizeof(long double) is 12 using mingw.  I see that long
   double
   is the same as double in MS Visual C++.
  
   http://en.wikipedia.org/wiki/Long_double
  
   but, as expected from the name:
  
   In [11]: np.dtype(np.float96).itemsize
   Out[11]: 12
  
  
   Hmm, good point. There should not be a float96 on Windows using the
   MSVC
   compiler, and the longdouble types 'gG' should return float64 and
   complex128
   respectively. OTOH, I believe the mingw compiler has real float96
   types
   but
   I wonder about library support. This is really a build issue and it
   would be
   good to have some feedback on what different platforms are doing so
   that
   we
   know if we are doing things right.
 
  Is it possible that numpy is getting confused by being compiled with
  mingw on top of a visual studio python?
 
  Some further forensics seem to suggest that, despite the fact the math
  suggests float96 is float64, the storage format it in fact 80-bit
  extended precision:
 
 
  Yes, extended precision is the type on Intel hardware with gcc, the
  96/128
  bits comes from alignment on 4 or 8 byte boundaries. With MSVC, double
  and
  long double are both ieee double, and on SPARC, long double is ieee quad
  precision.

 Right - but I think my researches are showing that the longdouble
 numbers are being _stored_ as 80 bit, but the math on those numbers is
 64 bit.

 Is there a reason than numpy can't do 80-bit math on these guys?  If
 there is, is there any point in having a float96 on windows?

 It's a compiler/architecture thing and depends on how the compiler
 interprets the long double c type. The gcc compiler does do 80 bit math on
 Intel/AMD hardware. MSVC doesn't, and probably never will. MSVC shouldn't
 produce float96 numbers, if it does, it is a bug. Mingw uses the gcc
 compiler, so the numbers are there, but if it uses the MS library it will
 have to convert them to double to do computations like sin(x) since there
 are no microsoft routines for extended precision. I suspect that gcc/ms
 combo is what is producing the odd results you are seeing.

 I think we might be talking past each other a bit.

 It seems to me that, if float96 must use float64 math, then it should
 be removed from the numpy namespace, because

If we were to do so, it would break too much code.


 a) It implies higher precision than float64 but does not provide it
 b) It uses more memory to no obvious advantage

There is an obvious advantage: to handle memory blocks which use long
double, created outside numpy (or even python).

Otherwise, while gcc indeed supports long double, the fact that the C
runtime doesn't really mean it is hopeless to reach any kind of
consistency. And I will reiterate what I said before about long
double: if you care about your code behaving consistency across
platforms, just forget about long double.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd-looking long double on windows 32 bit

2011-11-15 Thread David Cournapeau
On Tue, Nov 15, 2011 at 6:22 AM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Mon, Nov 14, 2011 at 10:08 PM, David Cournapeau courn...@gmail.com wrote:
 On Mon, Nov 14, 2011 at 9:01 PM, Matthew Brett matthew.br...@gmail.com 
 wrote:
 Hi,

 On Sun, Nov 13, 2011 at 5:03 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:


 On Sun, Nov 13, 2011 at 3:56 PM, Matthew Brett matthew.br...@gmail.com
 wrote:

 Hi,

 On Sun, Nov 13, 2011 at 1:34 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Sun, Nov 13, 2011 at 2:25 PM, Matthew Brett matthew.br...@gmail.com
  wrote:
 
  Hi,
 
  On Sun, Nov 13, 2011 at 8:21 AM, Charles R Harris
  charlesr.har...@gmail.com wrote:
  
  
   On Sun, Nov 13, 2011 at 12:57 AM, Matthew Brett
   matthew.br...@gmail.com
   wrote:
  
   Hi,
  
   On Sat, Nov 12, 2011 at 11:35 PM, Matthew Brett
   matthew.br...@gmail.com
   wrote:
Hi,
   
Sorry for my continued confusion here.  This is numpy 1.6.1 on
windows
XP 32 bit.
   
In [2]: np.finfo(np.float96).nmant
Out[2]: 52
   
In [3]: np.finfo(np.float96).nexp
Out[3]: 15
   
In [4]: np.finfo(np.float64).nmant
Out[4]: 52
   
In [5]: np.finfo(np.float64).nexp
Out[5]: 11
   
If there are 52 bits of precision, 2**53+1 should not be
representable, and sure enough:
   
In [6]: np.float96(2**53)+1
Out[6]: 9007199254740992.0
   
In [7]: np.float64(2**53)+1
Out[7]: 9007199254740992.0
   
If the nexp is right, the max should be higher for the float96
type:
   
In [9]: np.finfo(np.float64).max
Out[9]: 1.7976931348623157e+308
   
In [10]: np.finfo(np.float96).max
Out[10]: 1.#INF
   
I see that long double in C is 12 bytes wide, and double is the
usual
8
bytes.
  
   Sorry - sizeof(long double) is 12 using mingw.  I see that long
   double
   is the same as double in MS Visual C++.
  
   http://en.wikipedia.org/wiki/Long_double
  
   but, as expected from the name:
  
   In [11]: np.dtype(np.float96).itemsize
   Out[11]: 12
  
  
   Hmm, good point. There should not be a float96 on Windows using the
   MSVC
   compiler, and the longdouble types 'gG' should return float64 and
   complex128
   respectively. OTOH, I believe the mingw compiler has real float96
   types
   but
   I wonder about library support. This is really a build issue and it
   would be
   good to have some feedback on what different platforms are doing so
   that
   we
   know if we are doing things right.
 
  Is it possible that numpy is getting confused by being compiled with
  mingw on top of a visual studio python?
 
  Some further forensics seem to suggest that, despite the fact the math
  suggests float96 is float64, the storage format it in fact 80-bit
  extended precision:
 
 
  Yes, extended precision is the type on Intel hardware with gcc, the
  96/128
  bits comes from alignment on 4 or 8 byte boundaries. With MSVC, double
  and
  long double are both ieee double, and on SPARC, long double is ieee quad
  precision.

 Right - but I think my researches are showing that the longdouble
 numbers are being _stored_ as 80 bit, but the math on those numbers is
 64 bit.

 Is there a reason than numpy can't do 80-bit math on these guys?  If
 there is, is there any point in having a float96 on windows?

 It's a compiler/architecture thing and depends on how the compiler
 interprets the long double c type. The gcc compiler does do 80 bit math on
 Intel/AMD hardware. MSVC doesn't, and probably never will. MSVC shouldn't
 produce float96 numbers, if it does, it is a bug. Mingw uses the gcc
 compiler, so the numbers are there, but if it uses the MS library it will
 have to convert them to double to do computations like sin(x) since there
 are no microsoft routines for extended precision. I suspect that gcc/ms
 combo is what is producing the odd results you are seeing.

 I think we might be talking past each other a bit.

 It seems to me that, if float96 must use float64 math, then it should
 be removed from the numpy namespace, because

 If we were to do so, it would break too much code.

 David - please - obviously I'm not suggesting removing it without
 deprecating it.

Let's say I find it debatable that removing it (with all the
deprecations) would be a good use of effort, especially given that
there is no obviously better choice to be made.


 a) It implies higher precision than float64 but does not provide it
 b) It uses more memory to no obvious advantage

 There is an obvious advantage: to handle memory blocks which use long
 double, created outside numpy (or even python).

 Right - but that's a bit arcane, and I would have thought
 np.longdouble would be a good enough name for that.   Of course, the
 users may be surprised, as I was, that memory allocated for higher
 precision is using float64, and that may take them some time to work
 out.  I'll say again that 'longdouble' says to me 'something specific
 to the compiler' and 'float96' says

Re: [Numpy-discussion] failure to register ufunc loops for user defined types

2011-12-05 Thread David Cournapeau
On Sun, Dec 4, 2011 at 9:45 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 We'll see how much interest there is. If it becomes official you may get
 more feedback on features. There are some advantages to having some user
 types in numpy. One is that otherwise they tend to get lost, another is that
 having a working example or two provides a templates for others to work
 from, and finally they provide test material. Because official user types
 aren't assigned anywhere there might also be some conflicts. Maybe something
 like an extension types module would be a way around that. In any case, I
 think both rational numbers and quaternions would be useful to have and I
 hope there is some discussion of how to do that.

I agree that those will be useful, but I am worried about adding more
stuff in multiarray. User-types should really be separated from
multiarray. Ideally, they should be plugins but separated from
multiarray would be a good first step.

I realize it is a bit unfair to have this ready for Geoffray's code
changes, but depending on the timelines for the 2.0.0 milestone, I
think this would be a useful thing to have. Otherwise, if some ABI/API
changes are needed after 2.0, we will be dragged down with this for
years. I am willing to spend time on this. Geoffray, does this sound
acceptable to you ?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slow Numpy/MKL vs Matlab/MKL

2011-12-07 Thread David Cournapeau
On Tue, Dec 6, 2011 at 5:31 PM, Oleg Mikulya olegmi...@gmail.com wrote:
 Hi,

 How to make Numpy to match Matlab in term of performance ? I have tryied
 with different options, using different MKL libraries and ICC versions,
 still Numpy is below Matalb for certain basic tasks by ~2x. About 5 years
 ago I was able to get about same speed, not anymore. Matlab suppose to use
 same MKL, what it the reason of such Numpy slowness (beside one, yet
 fundamental, task) ?

Have you checked that the returned values are the same (up to some
precision) ? It may be that we don't use the same lapack underlying
function,

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving to gcc 4.* for win32 installers ?

2011-12-14 Thread David Cournapeau
On Tue, Dec 13, 2011 at 3:43 PM, Ralf Gommers
ralf.gomm...@googlemail.com wrote:
 On Sun, Oct 30, 2011 at 12:18 PM, David Cournapeau courn...@gmail.com
 wrote:

 On Thu, Oct 27, 2011 at 5:19 PM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
  Hi David,
 
  On Thu, Oct 27, 2011 at 3:02 PM, David Cournapeau courn...@gmail.com
  wrote:
 
  Hi,
 
  I was wondering if we could finally move to a more recent version of
  compilers for official win32 installers. This would of course concern
  the next release cycle, not the ones where beta/rc are already in
  progress.
 
  Basically, the pros:
   - we will have to move at some point
   - gcc 4.* seem less buggy, especially C++ and fortran.
   - no need to maintain msvcr90 vodoo
  The cons:
   - it will most likely break the ABI
   - we need to recompile atlas (but I can take care of it)
   - the biggest: it is difficult to combine gfortran with visual
  studio (more exactly you cannot link gfortran runtime to a visual
  studio executable). The only solution I could think of would be to
  recompile the gfortran runtime with Visual Studio, which for some
  reason does not sound very appealing :)
 
  To get the datetime changes to work with MinGW, we already concluded
  that
  building with 4.x is more or less required (without recognizing some of
  the
  points you list above). Changes to mingw32ccompiler to fix compilation
  with
  4.x went in in https://github.com/numpy/numpy/pull/156. It would be good
  if
  you could check those.

 I will look into it more carefully, but overall, it seems that
 building atlas 3.8.4, numpy and scipy with gcc 4.x works quite well.
 The main issue is that gcc 4.* adds some dependencies on mingw dlls.
 There are two options:
  - adding the dlls in the installers
  - statically linking those, which seems to be a bad idea
 (generalizing the dll boundaries problem to exception and things we
 would rather not care about:
 http://cygwin.com/ml/cygwin/2007-06/msg00332.html).

  It probably makes sense make this move for numpy 1.7. If this breaks the
  ABI
  then it would be easiest to make numpy 1.7 the minimum required version
  for
  scipy 0.11.

 My thinking as well.


 Hi David, what is the current status of this issue? I kind of forgot this is
 a prerequisite for the next release when starting the 1.7.0 release thread.

The only issue at this point is the distribution of mingw dlls. I have
not found a way to do it nicely (where nicely means something that is
distributed within numpy package). Given that those dlls are actually
versioned and seem to have a strong versioning policy, maybe we can
just install them inside the python installation ?

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Negative indexing.

2012-01-15 Thread David Cournapeau
On Sat, Jan 14, 2012 at 11:53 PM, Nathan Faggian
nathan.fagg...@gmail.com wrote:
 Hi,

 I am finding it less than useful to have the negative index wrapping on 
 nd-arrays. Here is a short example:

 import numpy as np
 a = np.zeros((3, 3))
 a[:,2] = 1000
 print a[0,-1]
 print a[0,-1]
 print a[-1,-1]

 In all cases 1000 is printed out.

 What I am after is a way to say please don't wrap around and have negative 
 indices behave in a way I choose.  I know this is a standard thing - but is 
 there a way to override that behaviour that doesn't involve cython or rolling 
 my own resampler?

Although it could be possible with lots of work, it would most likely
be a bad idea. You will need to wrap something around your
model/data/etc... Could you explain a bit more what you have in mind ?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving to gcc 4.* for win32 installers ?

2012-02-06 Thread David Cournapeau
On Sat, Feb 4, 2012 at 3:55 PM, Ralf Gommers
ralf.gomm...@googlemail.com wrote:


 On Wed, Dec 14, 2011 at 6:50 PM, Ralf Gommers ralf.gomm...@googlemail.com
 wrote:



 On Wed, Dec 14, 2011 at 3:04 PM, David Cournapeau courn...@gmail.com
 wrote:

 On Tue, Dec 13, 2011 at 3:43 PM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
  On Sun, Oct 30, 2011 at 12:18 PM, David Cournapeau courn...@gmail.com
  wrote:
 
  On Thu, Oct 27, 2011 at 5:19 PM, Ralf Gommers
  ralf.gomm...@googlemail.com wrote:
   Hi David,
  
   On Thu, Oct 27, 2011 at 3:02 PM, David Cournapeau
   courn...@gmail.com
   wrote:
  
   Hi,
  
   I was wondering if we could finally move to a more recent version
   of
   compilers for official win32 installers. This would of course
   concern
   the next release cycle, not the ones where beta/rc are already in
   progress.
  
   Basically, the pros:
    - we will have to move at some point
    - gcc 4.* seem less buggy, especially C++ and fortran.
    - no need to maintain msvcr90 vodoo
   The cons:
    - it will most likely break the ABI
    - we need to recompile atlas (but I can take care of it)
    - the biggest: it is difficult to combine gfortran with visual
   studio (more exactly you cannot link gfortran runtime to a visual
   studio executable). The only solution I could think of would be to
   recompile the gfortran runtime with Visual Studio, which for some
   reason does not sound very appealing :)
  
   To get the datetime changes to work with MinGW, we already concluded
   that
   building with 4.x is more or less required (without recognizing some
   of
   the
   points you list above). Changes to mingw32ccompiler to fix
   compilation
   with
   4.x went in in https://github.com/numpy/numpy/pull/156. It would be
   good
   if
   you could check those.
 
  I will look into it more carefully, but overall, it seems that
  building atlas 3.8.4, numpy and scipy with gcc 4.x works quite well.
  The main issue is that gcc 4.* adds some dependencies on mingw dlls.
  There are two options:
   - adding the dlls in the installers
   - statically linking those, which seems to be a bad idea
  (generalizing the dll boundaries problem to exception and things we
  would rather not care about:
  http://cygwin.com/ml/cygwin/2007-06/msg00332.html).
 
   It probably makes sense make this move for numpy 1.7. If this breaks
   the
   ABI
   then it would be easiest to make numpy 1.7 the minimum required
   version
   for
   scipy 0.11.
 
  My thinking as well.
 
 
  Hi David, what is the current status of this issue? I kind of forgot
  this is
  a prerequisite for the next release when starting the 1.7.0 release
  thread.

 The only issue at this point is the distribution of mingw dlls. I have
 not found a way to do it nicely (where nicely means something that is
 distributed within numpy package). Given that those dlls are actually
 versioned and seem to have a strong versioning policy, maybe we can
 just install them inside the python installation ?

 Although not ideal, I don't have a problem with that in principle.
 However, wouldn't it break installing without admin rights if Python is
 installed by the admin?


 David, do you have any more thoughts on this? Is there a final solution in
 sight? Anything I or anyone else can do to help?

I have not found a way to do it without installing the dll alongside
python libraries. That brings the problem of how to install libraries
there from bdist_wininst/bdist_msi installers, which I had not the
time to look at.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast method to to count a particular value in a large matrix

2012-02-06 Thread David Cournapeau
On Mon, Feb 6, 2012 at 1:17 AM, Wes McKinney wesmck...@gmail.com wrote:


 Whenever I get motivated enough I'm going to make a pull request on
 NumPy with something like khash.h and start fixing all the O(N log N)
 algorithms floating around that ought to be O(N). NumPy should really
 have a match function similar to R's and a lot of other things.

khash.h is not the only thing that I'd like to use in numpy if I had
more time :)

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving to gcc 4.* for win32 installers ?

2012-02-07 Thread David Cournapeau
On Tue, Feb 7, 2012 at 1:30 PM, Sturla Molden stu...@molden.no wrote:
 On 27.10.2011 15:02, David Cournapeau wrote:

    - we need to recompile atlas (but I can take care of it)
    - the biggest: it is difficult to combine gfortran with visual
 studio (more exactly you cannot link gfortran runtime to a visual
 studio executable).

 Why is that?

 I have used gfortran with Python on Windows a lot, never had a problem.

How did you link a library with mixed C and gfortran ?

 It's not like we are going to share CRT resources between C/Python and
 Fortran. That would be silly, regardless of compiler.

Well, that actually happens quite a bit in the libraries we depend on.
One solution could actually be removing any dependency on the fortran
runtime.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving to gcc 4.* for win32 installers ?

2012-02-07 Thread David Cournapeau
On Tue, Feb 7, 2012 at 1:55 PM, Sturla Molden stu...@molden.no wrote:
 On 07.02.2012 14:38, Sturla Molden wrote:

 May I suggest GotoBLAS2 instead of ATLAS?

 Or OpenBLAS, which is GotoBLAS2 except it is still maintained.

I did not know GotoBLAS2 was open source (it wasn't last time I
checked). That's very useful information, I will look into it.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On making Numpy 1.7 a long term support release.

2012-02-10 Thread David Cournapeau
On Sun, Feb 5, 2012 at 7:19 AM, Ralf Gommers
ralf.gomm...@googlemail.com wrote:


 On Sun, Feb 5, 2012 at 7:33 AM, Travis Oliphant tra...@continuum.io wrote:

 I think supporting Python 2.5 and above is completely fine.  I'd even be
 in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for NumPy
 2.8

 +1 for dropping Python 2.5 support also for an LTS release. That will make
 it a lot easier to use str.format() and the with statement (plus many other
 things) going forward, without having to think about if your changes can be
 backported to that LTS release.

At the risk of sounding like a broken record, I would really like to
stay to 2.4, especially for a long term release :) This is still the
basis used by a lots of long-term python products. If we can support
2.4 for a LTS, I would then be much more comfortable to allow bumping
to 2.5 for 1.8.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On making Numpy 1.7 a long term support release.

2012-02-11 Thread David Cournapeau
On Sat, Feb 11, 2012 at 9:08 AM, Ralf Gommers
ralf.gomm...@googlemail.com wrote:


 On Fri, Feb 10, 2012 at 8:51 PM, Ralf Gommers ralf.gomm...@googlemail.com
 wrote:



 On Fri, Feb 10, 2012 at 10:25 AM, David Cournapeau courn...@gmail.com
 wrote:

 On Sun, Feb 5, 2012 at 7:19 AM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
 
 
  On Sun, Feb 5, 2012 at 7:33 AM, Travis Oliphant tra...@continuum.io
  wrote:
 
  I think supporting Python 2.5 and above is completely fine.  I'd even
  be
  in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for
  NumPy
  2.8
 
  +1 for dropping Python 2.5 support also for an LTS release. That will
  make
  it a lot easier to use str.format() and the with statement (plus many
  other
  things) going forward, without having to think about if your changes
  can be
  backported to that LTS release.

 At the risk of sounding like a broken record, I would really like to
 stay to 2.4, especially for a long term release :) This is still the
 basis used by a lots of long-term python products. If we can support
 2.4 for a LTS, I would then be much more comfortable to allow bumping
 to 2.5 for 1.8.


 At the very least someone should step up to do the testing or maintain a
 buildbot for Python 2.4 then. Also for scipy, assuming that scipy keeps
 supporting the same Python versions as numpy.

 Here's a list of Python requirements for other important scientific python
 projects:
 - ipython: = 2.6
 - matplotlib: v1.1 supports 2.4-2.7, v1.2 will support = 2.6
 - scikit-learn: = 2.6
 - scikit-image: = 2.5
 - scikits.statsmodels: = 2.5 (next release probably = 2.6)

 That there are still some projects/products out there that still use Python
 2.4 (some examples of such products would be nice by the way) is not enough
 of a reason by itself to continue to support it in new releases. There has
 to be a good reason for those products to require the very latest numpy,
 even though they are fine with a very old Python and older versions of
 almost any other Python package.

I don't think that last argument is relevant for a LTS. Numpy is used
in environments where you cannot easily control what's installed. RHEL
still uses python 2.4 and will be supported until 2014 in the
production phase.

As for projects still using python 2.4, using the most downloaded
packages from this list
http://taichino.appspot.com/pypi_ranking/modules?page=1, most of them
supported python 2.4 or below. lxml, zc.buildout, setuptools, pip,
virtualenv and sqlalchemy do. Numpy itself is also used outside the
strict scientific realm, which is why I am a bit warry about just
comparing with other scientific python packages.

Now, if everybody else is against it, I don't want to be a pain about
it either :)

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] On making Numpy 1.7 a long term support release.

2012-02-11 Thread David Cournapeau
On Sat, Feb 11, 2012 at 1:30 PM, Ralf Gommers
ralf.gomm...@googlemail.com wrote:


 As Bruce said, 29 Feb 2012 and not 2014:
 https://access.redhat.com/support/policy/updates/errata/

I think Bruce and me were not talking about the same RHEL version (4 vs 5).

Let me see if I can set up a buildbot for 2.4.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Migrating issues to GitHub

2012-02-11 Thread David Cournapeau
On Sat, Feb 11, 2012 at 9:49 PM, Mark Wiebe mwwi...@gmail.com wrote:
 On Sat, Feb 11, 2012 at 3:12 PM, Eric Firing efir...@hawaii.edu wrote:

 On 02/11/2012 10:44 AM, Travis Oliphant wrote:snip



  2) You must be an admin to label an issue (i.e. set it as a bug,
  enhancement, or so forth).

 A third problem is that the entire style of presentation is poorly
 designed from a use standpoint, in comparison to the sourceforge tracker
 which mpl used previously.  The github tracker appears to have been
 designed by a graphics person, not a software maintainer.  The
 information density in the issue list is very low; it is impossible to
 scan a large number of issues at once; there doesn't seem to be any
 useful sorting and selection mechanism.


 The lack of a tabular way to mass-edit bugs is one of my biggest problems
 with the current trac. One thing that ideally we could do regularly is to
 rapidly triage 100s of bugs. Currently trac requires you to go through them
 one by one, like harvesting wheat with a scythe instead of a combine. Users
 who are mentioned in a lot of tickets also get spammed by a large number of
 message, instead of getting a single summary update of all the triaging that
 was done.

 Does the github bug tracker have a good story about mass bug-updating?

Github is better than trac on that issue: updating the milestone for
many bugs at once is simple. You don't have priorities, etc…, though.
The Rest API also enables in principle to write tools to automate the
repetitive tasks.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Updated differences between 1.5.1 to 1.6.1

2012-02-14 Thread David Cournapeau
Hi Travis,

It is great that some resources can be spent to have people paid to
work on NumPy. Thank you for making that happen.

I am slightly confused about roadmaps for numpy 1.8 and 2.0. This
needs discussion on the ML, and our release manager currently is Ralf
- he is the one who ultimately decides what goes when. I am also not
completely comfortable by having a roadmap advertised to Pycon not
coming from the community.

regards,

David

On Tue, Feb 14, 2012 at 9:03 AM, Travis Oliphant tra...@continuum.io wrote:
 For reference, here is the table that shows the actual changes between 1.5.1 
 and 1.6.1 at least on 64-bit platforms in terms of type-casting.  I updated 
 the comparison code to throw out changes that are just spelling differences 
 (i.e. where 1.6.1 chooses to create an output dtype with an 'L' character 
 code instead of a 'Q' which on 64-bit system is effectively the same).




        Mostly I'm happy with the changes (after a cursory review).  As I 
 expected, there are some real improvements.    Of course, I haven't looked at 
 the changes that occur when the scalar being used does not fit in the range 
 of the array data-type.   I don't see this change documented in the link that 
 Mark sent previously.   Is it somewhere else?   Also, it looks like 
 previously object arrays were returned for some coercions which now simply 
 fail.  Is that an expected result?

 At this point, I'm not going to recommend changes to 1.7 to deal with these 
 type-casting changes --- at least this thread will serve to show some of what 
 changes occurred if it bites anyone in the future.

 However, I will have other changes to NumPy 1.X that I will be proposing and 
 writing (and directing other people to write as well).  After some period of 
 quiet, this might be a refreshing change.  But, not all may see it that way.  
  I'm confident that we can resolve any concerns people might have.   Any 
 feature additions will preserve backward compatibility in NumPy 1.X.   Mark 
 W. will be helping with some of these changes, but mostly he will be working 
 on NumPy 2.0 which we have tentatively targeted for next January.    We have 
 a tentative target for NumPy 1.8 in June/July.    So far, there are three 
 developers who will be working on NumPy 1.8 (me, Francesc Alted, and Bryan 
 Van de Ven).  Mark Wiebe is slated to help us, as well, but I would like to 
 sponsor him as much as possible on the work for NumPy 2.0.    If anyone else 
 would like to join us, please let me know off-list.     There is room for 
 another talented person on our team.

 In addition to a few select features in NumPy 1.8 (a list of which will 
 follow in a later email),  we will also be working on reviewing the list of 
 bugs on Trac and fixing them, writing tests, and improving docstrings.    I 
 would also like to improve the state of the bug-tracker and get in place a 
 continuous integration system for NumPy.   We will be advertising our NumPy 
 1.8 roadmap and our NumPy 2.0 roadmap at PyCon, and are working on documents 
 that describe plans which we are hoping will be reviewed and discussed on 
 this list.

 I know that having more people working on the code-base for several months 
 will be a different scenario than what has transpired in the past.   
 Hopefully, this will be a productive time for everybody and our sometimes 
 different perspectives will be able to coalesce into a better result for more 
 people.

 Best regards,

 -Travis

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [cython-users] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3

2012-02-14 Thread David Cournapeau
On Mon, Feb 13, 2012 at 9:55 PM, Fernando Perez fperez@gmail.com wrote:
 Hi folks,

 [ I'm broadcasting this widely for maximum reach, but I'd appreciate
 it if replies can be kept to the *numpy* list, which is sort of the
 'base' list for scientific/numerical work.  It will make it much
 easier to organize a coherent set of notes later on.  Apology if
 you're subscribed to all and get it 10 times. ]

 As part of the PyData workshop (http://pydataworkshop.eventbrite.com)
 to be held March 2 and 3 at the Mountain View Google offices, we have
 scheduled a session for an open discussion with Guido van Rossum and
 hopefully as many core python-dev members who can make it.  We wanted
 to seize the combined opportunity of the PyData workshop bringing a
 number of 'scipy people' to Google with the timeline for Python 3.3,
 the first release after the Python language moratorium, being within
 sight: http://www.python.org/dev/peps/pep-0398.

 While a number of scientific Python packages are already available for
 Python 3 (either in released form or in their master git branches),
 it's fair to say that there hasn't been a major transition of the
 scientific community to Python3.  Since there is no more development
 being done on the Python2 series, eventually we will all want to find
 ways to make this transition, and we think that this is an excellent
 time to engage the core python development team and consider ideas
 that would make Python3 generally a more appealing language for
 scientific work.  Guido has made it clear that he doesn't speak for
 the day-to-day development of Python anymore, so we all should be
 aware that any ideas that come out of this panel will still need to be
 discussed with python-dev itself via standard mechanisms before
 anything is implemented.  Nonetheless, the opportunity for a solid
 face-to-face dialog for brainstorming was too good to pass up.

 The purpose of this email is then to solicit, from all of our
 community, ideas for this discussion.  In a week or so we'll need to
 summarize the main points brought up here and make a more concrete
 agenda out of it; I will also post a summary of the meeting afterwards
 here.

 Anything is a valid topic, some points just to get the conversation started:

 - Extra operators/PEP 225.  Here's a summary from the last time we
 went over this, years ago at Scipy 2008:
 http://mail.scipy.org/pipermail/numpy-discussion/2008-October/038234.html,
 and the current status of the document we wrote about it is here:
 file:///home/fperez/www/site/_build/html/py4science/numpy-pep225/numpy-pep225.html.

 - Improved syntax/support for rationals or decimal literals?  While
 Python now has both decimals
 (http://docs.python.org/library/decimal.html) and rationals
 (http://docs.python.org/library/fractions.html), they're quite clunky
 to use because they require full constructor calls.  Guido has
 mentioned in previous discussions toying with ideas about support for
 different kinds of numeric literals...

 - Using the numpy docstring standard python-wide, and thus having
 python improve the pathetic state of the stdlib's docstrings?  This is
 an area where our community is light years ahead of the standard
 library, but we'd all benefit from Python itself improving on this
 front.  I'm toying with the idea of giving a lighting talk at PyConn
 about this, comparing the great, robust culture and tools of good
 docstrings across the Scipy ecosystem with the sad, sad state of
 docstrings in the stdlib.  It might spur some movement on that front
 from the stdlib authors, esp. if the core python-dev team realizes the
 value and benefit it can bring (at relatively low cost, given how most
 of the information does exist, it's just in the wrong places).  But
 more importantly for us, if there was truly a universal standard for
 high-quality docstrings across Python projects, building good
 documentation/help machinery would be a lot easier, as we'd know what
 to expect and search for (such as rendering them nicely in the ipython
 notebook, providing high-quality cross-project help search, etc).

 - Literal syntax for arrays?  Sage has been floating a discussion
 about a literal matrix syntax
 (https://groups.google.com/forum/#!topic/sage-devel/mzwepqZBHnA).  For
 something like this to go into python in any meaningful way there
 would have to be core multidimensional arrays in the language, but
 perhaps it's time to think about a piece of the numpy array itself
 into Python?  This is one of the more 'out there' ideas, but after
 all, that's the point of a discussion like this, especially
 considering we'll have both Travis and Guido in one room.

 - Other syntactic sugar? Sage has a..b = range(a, b+1), which I
 actually think is  both nice and useful... There's also the question
 of allowing a:b:c notation outside of [], which has come up a few
 times in conversation over the last few years. Others?

 - The packaging quagmire?  This continues to be a problem, 

Re: [Numpy-discussion] Numpy governance update

2012-02-15 Thread David Cournapeau
On Wed, Feb 15, 2012 at 10:30 PM, Peter Wang pw...@streamitive.com wrote:
 On Feb 15, 2012, at 3:36 PM, Matthew Brett wrote:

 Honestly - as I was saying to Alan and indirectly to Ben - any formal
 model - at all - is preferable to the current situation. Personally, I
 would say that making the founder of a company, which is working to
 make money from Numpy, the only decision maker on numpy - is - scary.

 How is this different from the situation of the last 4 years?  Travis was 
 President at Enthought, which makes money from not only Numpy but SciPy as 
 well.  In addition to employing Travis, Enthought also employees many other 
 key contributors to Numpy and Scipy, like Robert and David.  Furthermore, the 
 Scipy and Numpy mailing lists and repos and web pages were all hosted at 
 Enthought.  If they didn't like how a particular discussion was going, they 
 could have memory-holed the entire conversation from the archives, or worse 
 yet, revoked commit access and reverted changes.

I actually think it is somehow different. For once, while Travis was
at Enthought, he contributed much less to the discussions (by his own
account), so the risk of conflict of interest was not very high. My
own contributions to numpy since I have joined Enthought are close to
nil as well :)

There have been cases of disagreements on NumPy: for any case where
the decision taken by people from one company would prevail, you will
not be able to prevent people from thinking the interests of the
company prevailed. In numpy, where people make a suggestion and there
was not enough review, the feature generally went in. This is
fundamentally different from most open source projects I am aware of,
and could go bad when considered with my previous point.

As far as I am concerned, the following would be enough to resolve any issues:
  -  having one (or more) persons outside any company interest (e.g.
Chuck, Pauli) with a veto.
  -  no significant feature goes in without a review from people
outside the organization it is coming from.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-17 Thread David Cournapeau
Hi Travis,

On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant tra...@continuum.io wrote:
 Mark Wiebe and I have been discussing off and on (as well as talking with 
 Charles) a good way forward to balance two competing desires:

        * addition of new features that are needed in NumPy
        * improving the code-base generally and moving towards a more 
 maintainable NumPy

 I know there are load voices for just focusing on the second of these and 
 avoiding the first until we have finished that.  I recognize the need to 
 improve the code base, but I will also be pushing for improvements to the 
 feature-set and user experience in the process.

 As a result, I am proposing a rough outline for releases over the next year:

        * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. 
  Bryan, Francesc, Mark, and I are able to help triage some of those.

        * NumPy 1.8 to come out in July which will have as many ABI-compatible 
 feature enhancements as we can add while improving test coverage and code 
 cleanup.   I will post to this list more details of what we plan to address 
 with it later.    Included for possible inclusion are:
        * resolving the NA/missing-data issues
        * finishing group-by
        * incorporating the start of label arrays
        * incorporating a meta-object
        * a few new dtypes (variable-length string, varialbe-length unicode 
 and an enum type)
        * adding ufunc support for flexible dtypes and possibly structured 
 arrays
        * allowing generalized ufuncs to work on more kinds of arrays besides 
 just contiguous
        * improving the ability for NumPy to receive JIT-generated function 
 pointers for ufuncs and other calculation opportunities
        * adding filters to Input and Output
        * simple computed fields for dtypes
        * accepting a Data-Type specification as a class or JSON file
        * work towards improving the dtype-addition mechanism
        * re-factoring of code so that it can compile with a C++ compiler and 
 be minimally dependent on Python data-structures.

This is a pretty exciting list of features. What is the rationale for
code being compiled as C++ ? IMO, it will be difficult to do so
without preventing useful C constructs, and without removing some of
the existing features (like our use of C99 complex). The subset that
is both C and C++ compatible is quite constraining.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-17 Thread David Cournapeau
On Fri, Feb 17, 2012 at 3:39 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau courn...@gmail.com
 wrote:

 Hi Travis,

 On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant tra...@continuum.io
 wrote:
  Mark Wiebe and I have been discussing off and on (as well as talking
  with Charles) a good way forward to balance two competing desires:
 
         * addition of new features that are needed in NumPy
         * improving the code-base generally and moving towards a more
  maintainable NumPy
 
  I know there are load voices for just focusing on the second of these
  and avoiding the first until we have finished that.  I recognize the need 
  to
  improve the code base, but I will also be pushing for improvements to the
  feature-set and user experience in the process.
 
  As a result, I am proposing a rough outline for releases over the next
  year:
 
         * NumPy 1.7 to come out as soon as the serious bugs can be
  eliminated.  Bryan, Francesc, Mark, and I are able to help triage some of
  those.
 
         * NumPy 1.8 to come out in July which will have as many
  ABI-compatible feature enhancements as we can add while improving test
  coverage and code cleanup.   I will post to this list more details of what
  we plan to address with it later.    Included for possible inclusion are:
         * resolving the NA/missing-data issues
         * finishing group-by
         * incorporating the start of label arrays
         * incorporating a meta-object
         * a few new dtypes (variable-length string, varialbe-length
  unicode and an enum type)
         * adding ufunc support for flexible dtypes and possibly
  structured arrays
         * allowing generalized ufuncs to work on more kinds of arrays
  besides just contiguous
         * improving the ability for NumPy to receive JIT-generated
  function pointers for ufuncs and other calculation opportunities
         * adding filters to Input and Output
         * simple computed fields for dtypes
         * accepting a Data-Type specification as a class or JSON file
         * work towards improving the dtype-addition mechanism
         * re-factoring of code so that it can compile with a C++ compiler
  and be minimally dependent on Python data-structures.

 This is a pretty exciting list of features. What is the rationale for
 code being compiled as C++ ? IMO, it will be difficult to do so
 without preventing useful C constructs, and without removing some of
 the existing features (like our use of C99 complex). The subset that
 is both C and C++ compatible is quite constraining.


 I'm in favor of this myself, C++ would allow a lot code cleanup and make it
 easier to provide an extensible base, I think it would be a natural fit with
 numpy. Of course, some C++ projects become tangled messes of inheritance,
 but I'd be very interested in seeing what a good C++ designer like Mark,
 intimately familiar with the numpy code base, could do. This opportunity
 might not come by again anytime soon and I think we should grab onto it. The
 initial step would be a release whose code that would compile in both C/C++,
 which mostly comes down to removing C++ keywords like 'new'.

C++ will make integration with external environments much harder
(calling a C++ library from a non C++ program is very hard, especially
for cross-platform projects), and I am not convinced by the more
extensible argument.

Making the numpy C code buildable by a C++ compiler is harder than
removing keywords.

 I did suggest running it by you for build issues, so please raise any you
 can think of. Note that MatPlotLib is in C++, so I don't think the problems
 are insurmountable. And choosing a set of compilers to support is something
 that will need to be done.

I don't know for matplotlib, but for scipy, quite a few issues were
caused by our C++ extensions in scipy.sparse. But build issues are a
not a strong argument against C++ - I am sure those could be worked
out.

regards,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-17 Thread David Cournapeau
I don't think c++ has any significant advantage over c for high performance
libraries. I am not convinced by the number of people argument either: it
is not my experience that c++ is easier to maintain in a open source
context, where the level of people is far from consistent. I doubt many
people did not contribute to numoy because it is in c instead if c++. While
this is somehow subjective, there are reasons that c is much more common
than c++ in that context.

I would much rather move most part to cython to solve subtle ref counting
issues, typically.

The only way that i know of to have a stable and usable abi is to wrap the
c++ code in c. Wrapping c++ libraries in python  has always been a pain in
my experience. How are template or exceptions handled across languages ? it
will also be a significant issue on windows with open source compilers.

Interestingly, the api from clang exported to other languages is in c...

David
Le 17 févr. 2012 18:21, Mark Wiebe mwwi...@gmail.com a écrit :

 On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing efir...@hawaii.edu wrote:

 On 02/17/2012 05:39 AM, Charles R Harris wrote:
 
 
  On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau courn...@gmail.com
  mailto:courn...@gmail.com wrote:
 
  Hi Travis,
 
  On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant
  tra...@continuum.io mailto:tra...@continuum.io wrote:
Mark Wiebe and I have been discussing off and on (as well as
  talking with Charles) a good way forward to balance two competing
  desires:
   
   * addition of new features that are needed in NumPy
   * improving the code-base generally and moving towards a
  more maintainable NumPy
   
I know there are load voices for just focusing on the second of
  these and avoiding the first until we have finished that.  I
  recognize the need to improve the code base, but I will also be
  pushing for improvements to the feature-set and user experience in
  the process.
   
As a result, I am proposing a rough outline for releases over
the
  next year:
   
   * NumPy 1.7 to come out as soon as the serious bugs can
be
  eliminated.  Bryan, Francesc, Mark, and I are able to help triage
  some of those.
   
   * NumPy 1.8 to come out in July which will have as many
  ABI-compatible feature enhancements as we can add while improving
  test coverage and code cleanup.   I will post to this list more
  details of what we plan to address with it later.Included for
  possible inclusion are:
   * resolving the NA/missing-data issues
   * finishing group-by
   * incorporating the start of label arrays
   * incorporating a meta-object
   * a few new dtypes (variable-length string,
  varialbe-length unicode and an enum type)
   * adding ufunc support for flexible dtypes and possibly
  structured arrays
   * allowing generalized ufuncs to work on more kinds of
  arrays besides just contiguous
   * improving the ability for NumPy to receive
JIT-generated
  function pointers for ufuncs and other calculation opportunities
   * adding filters to Input and Output
   * simple computed fields for dtypes
   * accepting a Data-Type specification as a class or JSON
file
   * work towards improving the dtype-addition mechanism
   * re-factoring of code so that it can compile with a C++
  compiler and be minimally dependent on Python data-structures.
 
  This is a pretty exciting list of features. What is the rationale
for
  code being compiled as C++ ? IMO, it will be difficult to do so
  without preventing useful C constructs, and without removing some
of
  the existing features (like our use of C99 complex). The subset
that
  is both C and C++ compatible is quite constraining.
 
 
  I'm in favor of this myself, C++ would allow a lot code cleanup and
make
  it easier to provide an extensible base, I think it would be a natural
  fit with numpy. Of course, some C++ projects become tangled messes of
  inheritance, but I'd be very interested in seeing what a good C++
  designer like Mark, intimately familiar with the numpy code base, could
  do. This opportunity might not come by again anytime soon and I think
we
  should grab onto it. The initial step would be a release whose code
that
  would compile in both C/C++, which mostly comes down to removing C++
  keywords like 'new'.
 
  I did suggest running it by you for build issues, so please raise any
  you can think of. Note that MatPlotLib is in C++, so I don't think the
  problems are insurmountable. And choosing a set of compilers to support
  is something that will need to be done.

 It's true that matplotlib relies heavily on C++, both via the Agg
 library and in its own extension code.  Personally

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-17 Thread David Cournapeau
Le 17 févr. 2012 17:58, Mark Wiebe mwwi...@gmail.com a écrit :

 On Fri, Feb 17, 2012 at 10:27 AM, David Cournapeau courn...@gmail.com
wrote:

 On Fri, Feb 17, 2012 at 3:39 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau courn...@gmail.com
  wrote:
 
  Hi Travis,
 
  On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant tra...@continuum.io

  wrote:
   Mark Wiebe and I have been discussing off and on (as well as talking
   with Charles) a good way forward to balance two competing desires:
  
  * addition of new features that are needed in NumPy
  * improving the code-base generally and moving towards a more
   maintainable NumPy
  
   I know there are load voices for just focusing on the second of
these
   and avoiding the first until we have finished that.  I recognize
the need to
   improve the code base, but I will also be pushing for improvements
to the
   feature-set and user experience in the process.
  
   As a result, I am proposing a rough outline for releases over the
next
   year:
  
  * NumPy 1.7 to come out as soon as the serious bugs can be
   eliminated.  Bryan, Francesc, Mark, and I are able to help triage
some of
   those.
  
  * NumPy 1.8 to come out in July which will have as many
   ABI-compatible feature enhancements as we can add while improving
test
   coverage and code cleanup.   I will post to this list more details
of what
   we plan to address with it later.Included for possible
inclusion are:
  * resolving the NA/missing-data issues
  * finishing group-by
  * incorporating the start of label arrays
  * incorporating a meta-object
  * a few new dtypes (variable-length string, varialbe-length
   unicode and an enum type)
  * adding ufunc support for flexible dtypes and possibly
   structured arrays
  * allowing generalized ufuncs to work on more kinds of arrays
   besides just contiguous
  * improving the ability for NumPy to receive JIT-generated
   function pointers for ufuncs and other calculation opportunities
  * adding filters to Input and Output
  * simple computed fields for dtypes
  * accepting a Data-Type specification as a class or JSON file
  * work towards improving the dtype-addition mechanism
  * re-factoring of code so that it can compile with a C++
compiler
   and be minimally dependent on Python data-structures.
 
  This is a pretty exciting list of features. What is the rationale for
  code being compiled as C++ ? IMO, it will be difficult to do so
  without preventing useful C constructs, and without removing some of
  the existing features (like our use of C99 complex). The subset that
  is both C and C++ compatible is quite constraining.
 
 
  I'm in favor of this myself, C++ would allow a lot code cleanup and
make it
  easier to provide an extensible base, I think it would be a natural
fit with
  numpy. Of course, some C++ projects become tangled messes of
inheritance,
  but I'd be very interested in seeing what a good C++ designer like
Mark,
  intimately familiar with the numpy code base, could do. This
opportunity
  might not come by again anytime soon and I think we should grab onto
it. The
  initial step would be a release whose code that would compile in both
C/C++,
  which mostly comes down to removing C++ keywords like 'new'.

 C++ will make integration with external environments much harder
 (calling a C++ library from a non C++ program is very hard, especially
 for cross-platform projects), and I am not convinced by the more
 extensible argument.


 The whole of NumPy could be written utilizing C++ extensively while still
using exactly the same API and ABI numpy has now. C++ does not force
anything about API/ABI design decisions.

 One good document to read about how a major open source project
transitioned from C to C++ is about gcc. Their points comparing C and C++
apply to numpy quite well, and being compiler authors, they're intimately
familiar with ABI and performance issues:

 http://gcc.gnu.org/wiki/gcc-in-cxx#The_gcc-in-cxx_branch

 Making the numpy C code buildable by a C++ compiler is harder than
 removing keywords.


 Certainly, but it's not a difficult task for someone who's familiar with
both C and C++.


  I did suggest running it by you for build issues, so please raise any
you
  can think of. Note that MatPlotLib is in C++, so I don't think the
problems
  are insurmountable. And choosing a set of compilers to support is
something
  that will need to be done.

 I don't know for matplotlib, but for scipy, quite a few issues were
 caused by our C++ extensions in scipy.sparse. But build issues are a
 not a strong argument against C++ - I am sure those could be worked
 out.


 On this topic, I'd like to ask what it would take to change the default
warning levels in all the build configurations? Building with no warnings
under high warning

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-17 Thread David Cournapeau
Le 18 févr. 2012 00:58, Charles R Harris charlesr.har...@gmail.com a
écrit :



 On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau courn...@gmail.com
wrote:

 I don't think c++ has any significant advantage over c for high
performance libraries. I am not convinced by the number of people argument
either: it is not my experience that c++ is easier to maintain in a open
source context, where the level of people is far from consistent. I doubt
many people did not contribute to numoy because it is in c instead if c++.
While this is somehow subjective, there are reasons that c is much more
common than c++ in that context.


 I think C++ offers much better tools than C for the sort of things in
Numpy. The compiler will take care of lots of things that now have to be
hand crafted and I wouldn't be surprised to see the code size shrink by a
significant factor.

There are two arguments here: that c code in numpy could be improved, and
that c++ is the best way to do it. Nobody so far has argued against the
first argument. i think there is a lot of space to improve things while
still be in C.

You say that the compiler would take care of a lot of things: so far, the
main thing that has been mentionned is raii. While it is certainly a useful
concept, I find it ewtremely difficult to use correctly in real
applications. Things that are simple to do on simple examples become really
hard to deal with when features start to interact with each other (which is
always in c++). Writing robust code that is exception safe with the stl
requires a lot of knowledge. I don't have this knowledge. I have .o doubt
Mark has this knowledge. Does anyone else on this list has ?


 I would much rather move most part to cython to solve subtle ref
counting issues, typically.


 Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;)
Cython good for the Python interface, but once past that barrier C is
easier, and C++ has lots of useful things.

 The only way that i know of to have a stable and usable abi is to wrap
the c++ code in c. Wrapping c++ libraries in python  has always been a pain
in my experience. How are template or exceptions handled across languages ?
it will also be a significant issue on windows with open source compilers.

 Interestingly, the api from clang exported to other languages is in c...


 The api isn't the same as the implementation language. I wouldn't
prejudge these issues, but some indication of how they would be solved
might be helpful.

I understand that api and inplementation language are not the same: you
just quoted the part where I was mentioning it :)

Assuming a c++ inplementation with a c api, how will you deal with
templates ? how will you deal with exception ? How will you deal with
exception crossing dll/so between different compilers, which is a very
common situation in our community ?

david


 snip

 Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-17 Thread David Cournapeau
Le 17 févr. 2012 18:21, Mark Wiebe mwwi...@gmail.com a écrit :

 On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing efir...@hawaii.edu wrote:

 On 02/17/2012 05:39 AM, Charles R Harris wrote:
 
 
  On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau courn...@gmail.com
  mailto:courn...@gmail.com wrote:
 
  Hi Travis,
 
  On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant
  tra...@continuum.io mailto:tra...@continuum.io wrote:
Mark Wiebe and I have been discussing off and on (as well as
  talking with Charles) a good way forward to balance two competing
  desires:
   
   * addition of new features that are needed in NumPy
   * improving the code-base generally and moving towards a
  more maintainable NumPy
   
I know there are load voices for just focusing on the second of
  these and avoiding the first until we have finished that.  I
  recognize the need to improve the code base, but I will also be
  pushing for improvements to the feature-set and user experience in
  the process.
   
As a result, I am proposing a rough outline for releases over
the
  next year:
   
   * NumPy 1.7 to come out as soon as the serious bugs can
be
  eliminated.  Bryan, Francesc, Mark, and I are able to help triage
  some of those.
   
   * NumPy 1.8 to come out in July which will have as many
  ABI-compatible feature enhancements as we can add while improving
  test coverage and code cleanup.   I will post to this list more
  details of what we plan to address with it later.Included for
  possible inclusion are:
   * resolving the NA/missing-data issues
   * finishing group-by
   * incorporating the start of label arrays
   * incorporating a meta-object
   * a few new dtypes (variable-length string,
  varialbe-length unicode and an enum type)
   * adding ufunc support for flexible dtypes and possibly
  structured arrays
   * allowing generalized ufuncs to work on more kinds of
  arrays besides just contiguous
   * improving the ability for NumPy to receive
JIT-generated
  function pointers for ufuncs and other calculation opportunities
   * adding filters to Input and Output
   * simple computed fields for dtypes
   * accepting a Data-Type specification as a class or JSON
file
   * work towards improving the dtype-addition mechanism
   * re-factoring of code so that it can compile with a C++
  compiler and be minimally dependent on Python data-structures.
 
  This is a pretty exciting list of features. What is the rationale
for
  code being compiled as C++ ? IMO, it will be difficult to do so
  without preventing useful C constructs, and without removing some
of
  the existing features (like our use of C99 complex). The subset
that
  is both C and C++ compatible is quite constraining.
 
 
  I'm in favor of this myself, C++ would allow a lot code cleanup and
make
  it easier to provide an extensible base, I think it would be a natural
  fit with numpy. Of course, some C++ projects become tangled messes of
  inheritance, but I'd be very interested in seeing what a good C++
  designer like Mark, intimately familiar with the numpy code base, could
  do. This opportunity might not come by again anytime soon and I think
we
  should grab onto it. The initial step would be a release whose code
that
  would compile in both C/C++, which mostly comes down to removing C++
  keywords like 'new'.
 
  I did suggest running it by you for build issues, so please raise any
  you can think of. Note that MatPlotLib is in C++, so I don't think the
  problems are insurmountable. And choosing a set of compilers to support
  is something that will need to be done.

 It's true that matplotlib relies heavily on C++, both via the Agg
 library and in its own extension code.  Personally, I don't like this; I
 think it raises the barrier to contributing.  C++ is an order of
 magnitude more complicated than C--harder to read, and much harder to
 write, unless one is a true expert. In mpl it brings reliance on the CXX
 library, which Mike D. has had to help maintain.  And if it does
 increase compiler specificity, that's bad.


 This gets to the recruitment issue, which is one of the most important
problems I see numpy facing. I personally have contributed a lot of code to
NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was
the biggest negative point when I considered whether it was worth
contributing to the project. I suspect there are many programmers out there
who are skilled in low-level, high-performance C++, who would be willing to
contribute, but don't want to code in C.

This is a really important issue, because accessibility is the essential
reason why I am so strongly against

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-17 Thread David Cournapeau
Le 18 févr. 2012 03:53, Charles R Harris charlesr.har...@gmail.com a
écrit :



 On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau courn...@gmail.com
wrote:


 Le 18 févr. 2012 00:58, Charles R Harris charlesr.har...@gmail.com a
écrit :


 
 
 
  On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau courn...@gmail.com
wrote:
 
  I don't think c++ has any significant advantage over c for high
performance libraries. I am not convinced by the number of people argument
either: it is not my experience that c++ is easier to maintain in a open
source context, where the level of people is far from consistent. I doubt
many people did not contribute to numoy because it is in c instead if c++.
While this is somehow subjective, there are reasons that c is much more
common than c++ in that context.
 
 
  I think C++ offers much better tools than C for the sort of things in
Numpy. The compiler will take care of lots of things that now have to be
hand crafted and I wouldn't be surprised to see the code size shrink by a
significant factor.

 There are two arguments here: that c code in numpy could be improved,
and that c++ is the best way to do it. Nobody so far has argued against the
first argument. i think there is a lot of space to improve things while
still be in C.

 You say that the compiler would take care of a lot of things: so far,
the main thing that has been mentionned is raii. While it is certainly a
useful concept, I find it ewtremely difficult to use correctly in real
applications. Things that are simple to do on simple examples become really
hard to deal with when features start to interact with each other (which is
always in c++). Writing robust code that is exception safe with the stl
requires a lot of knowledge. I don't have this knowledge. I have .o doubt
Mark has this knowledge. Does anyone else on this list has ?


 I have the sense you have written much in C++. Exception handling is
maybe one of the weakest aspects of C, that is, it basically doesn't have
any. The point is, I'd rather not *have* to worry much about the C/C++ side
of things, and I think once a solid foundation is in place I won't have to
nearly as much.

 Back in the late 80's I used rather nice Fortran and C++ compilers for
writing code to run in extended DOS (the dos limit was 640 KB at that
time). They were written in - wait for it - Pascal. The authors explained
this seemingly odd decision by claiming that Pascal was better for bigger
projects than C, and I agreed with them ;) Now you can point to Linux,
which is 30 million + lines of C, but that is rather exceptional and the
barriers to entry at this point are pretty darn high. My own experience is
that beginners can seldom write more than a page of C and get it right,
mostly because of pointers. Now C++ has a ton of subtleties and one needs
to decide up front what parts to use and what not, but once a well designed
system is in place, many things become easier because a lot of housekeeping
is done for you.

 My own concern here is that the project is bigger than Mark thinks and he
might get sucked off into a sideline, but I'd sure like to see the
experiment made.

  I would much rather move most part to cython to solve subtle ref
counting issues, typically.
 
 
  Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner
;) Cython good for the Python interface, but once past that barrier C is
easier, and C++ has lots of useful things.
 
  The only way that i know of to have a stable and usable abi is to
wrap the c++ code in c. Wrapping c++ libraries in python  has always been a
pain in my experience. How are template or exceptions handled across
languages ? it will also be a significant issue on windows with open source
compilers.
 
  Interestingly, the api from clang exported to other languages is in
c...
 
 
  The api isn't the same as the implementation language. I wouldn't
prejudge these issues, but some indication of how they would be solved
might be helpful.

 I understand that api and inplementation language are not the same: you
just quoted the part where I was mentioning it :)

 Assuming a c++ inplementation with a c api, how will you deal with
templates ? how will you deal with exception ? How will you deal with
exception crossing dll/so between different compilers, which is a very
common situation in our community ?


 None of these strike me as relevant, I mean, they are internals, not api
problems, and shouldn't be visible to the user. How Mark would implement
the C++ API, as opposed to the C API I don't know, but since both would be
there I don't see the problem. But really, we need more details on how
these things would work.

I don't understand why you think this is not relevant ? If numpy is in c++,
with a C API, most users of numpy C/C++ API will use the C API, at least at
first, since most of them are in C. Changes of restrictions on how this API
xan be used is visible.

To be more concrete, if numpy is built by MS compiler, and an exception is
thrown

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-17 Thread David Cournapeau
Le 18 févr. 2012 04:37, Charles R Harris charlesr.har...@gmail.com a
écrit :



 On Fri, Feb 17, 2012 at 9:18 PM, David Cournapeau courn...@gmail.com
wrote:


 Le 18 févr. 2012 03:53, Charles R Harris charlesr.har...@gmail.com a
écrit :


 
 
 
  On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau courn...@gmail.com
wrote:
 
 
  Le 18 févr. 2012 00:58, Charles R Harris charlesr.har...@gmail.com
a écrit :
 
 
  
  
  
   On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau 
courn...@gmail.com wrote:
  
   I don't think c++ has any significant advantage over c for high
performance libraries. I am not convinced by the number of people argument
either: it is not my experience that c++ is easier to maintain in a open
source context, where the level of people is far from consistent. I doubt
many people did not contribute to numoy because it is in c instead if c++.
While this is somehow subjective, there are reasons that c is much more
common than c++ in that context.
  
  
   I think C++ offers much better tools than C for the sort of things
in Numpy. The compiler will take care of lots of things that now have to be
hand crafted and I wouldn't be surprised to see the code size shrink by a
significant factor.
 
  There are two arguments here: that c code in numpy could be improved,
and that c++ is the best way to do it. Nobody so far has argued against the
first argument. i think there is a lot of space to improve things while
still be in C.
 
  You say that the compiler would take care of a lot of things: so far,
the main thing that has been mentionned is raii. While it is certainly a
useful concept, I find it ewtremely difficult to use correctly in real
applications. Things that are simple to do on simple examples become really
hard to deal with when features start to interact with each other (which is
always in c++). Writing robust code that is exception safe with the stl
requires a lot of knowledge. I don't have this knowledge. I have .o doubt
Mark has this knowledge. Does anyone else on this list has ?
 
 
  I have the sense you have written much in C++. Exception handling is
maybe one of the weakest aspects of C, that is, it basically doesn't have
any. The point is, I'd rather not *have* to worry much about the C/C++ side
of things, and I think once a solid foundation is in place I won't have to
nearly as much.
 
  Back in the late 80's I used rather nice Fortran and C++ compilers for
writing code to run in extended DOS (the dos limit was 640 KB at that
time). They were written in - wait for it - Pascal. The authors explained
this seemingly odd decision by claiming that Pascal was better for bigger
projects than C, and I agreed with them ;) Now you can point to Linux,
which is 30 million + lines of C, but that is rather exceptional and the
barriers to entry at this point are pretty darn high. My own experience is
that beginners can seldom write more than a page of C and get it right,
mostly because of pointers. Now C++ has a ton of subtleties and one needs
to decide up front what parts to use and what not, but once a well designed
system is in place, many things become easier because a lot of housekeeping
is done for you.
 
  My own concern here is that the project is bigger than Mark thinks and
he might get sucked off into a sideline, but I'd sure like to see the
experiment made.
 
   I would much rather move most part to cython to solve subtle ref
counting issues, typically.
  
  
   Not me, I'd rather write most stuff in C/C++ than Cython, C is
cleaner ;) Cython good for the Python interface, but once past that barrier
C is easier, and C++ has lots of useful things.
  
   The only way that i know of to have a stable and usable abi is to
wrap the c++ code in c. Wrapping c++ libraries in python  has always been a
pain in my experience. How are template or exceptions handled across
languages ? it will also be a significant issue on windows with open source
compilers.
  
   Interestingly, the api from clang exported to other languages is
in c...
  
  
   The api isn't the same as the implementation language. I wouldn't
prejudge these issues, but some indication of how they would be solved
might be helpful.
 
  I understand that api and inplementation language are not the same:
you just quoted the part where I was mentioning it :)
 
  Assuming a c++ inplementation with a c api, how will you deal with
templates ? how will you deal with exception ? How will you deal with
exception crossing dll/so between different compilers, which is a very
common situation in our community ?
 
 
  None of these strike me as relevant, I mean, they are internals, not
api problems, and shouldn't be visible to the user. How Mark would
implement the C++ API, as opposed to the C API I don't know, but since both
would be there I don't see the problem. But really, we need more details on
how these things would work.

 I don't understand why you think this is not relevant ? If numpy is in
c++, with a C API, most users of numpy C/C++ API

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-17 Thread David Cournapeau
Le 18 févr. 2012 06:18, Christopher Jordan-Squire cjord...@uw.edu a
écrit :

 On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden stu...@molden.no wrote:
 
 
  Den 18. feb. 2012 kl. 05:01 skrev Jason Grout 
jason-s...@creativetrax.com:
 
  On 2/17/12 9:54 PM, Sturla Molden wrote:
  We would have to write a C++ programming tutorial that is based on
Pyton knowledge instead of C knowledge.
 
  I personally would love such a thing.  It's been a while since I did
  anything nontrivial on my own in C++.
 
 
  One example: How do we code multiple return values?
 
  In Python:
  - Return a tuple.
 
  In C:
  - Use pointers (evilness)
 
  In C++:
  - Return a std::tuple, as you would in Python.
  - Use references, as you would in Fortran or Pascal.
  - Use pointers, as you would in C.
 
  C++ textbooks always pick the last...
 
  I would show the first and the second method, and perhaps intentionally
forget the last.
 
  Sturla
 

 I can add my own 2 cents about cython vs. C vs. C++, based on summer
 coding experiences.

 I was an intern at Enthought, sharing an office with Mark W. (Which
 was a treat. I recommend you all quit your day jobs and haunt whatever
 office Mark is inhabiting.) I was trying to optimize some code and
 that lead to experimenting with both cython and C.

 Dealing with the C internals of numpy was frustrating. Since C doesn't
 have templating but numpy kinda needs it, instead python scripts go
 over and manually perform templating. Not the most obvious thing.
 There were other issues  in the background--including C doesn't allow
 for abstraction (i.e. easy to read), lots of pointer-fu is required,
 and the C API is lightly documented and already plenty difficult.

Please understand that the argument is not to maintain a status quo.

Lack of API documentation, internals that need significant work are
certainly issues. I fail to see how writing in C++ will solve the
documentation issues.

On the abstraction side of things, let's agree to disagree. Plenty of
complex projects are written in both languages to make this a matter of
mostly subjective matter.


 On the flip side, cython looked pretty...but I didn't get the
 performance gains I wanted, and had to spend a lot of time figuring
 out if it was cython, needing to add types, buggy support for numpy,
 or actually the algorithm. The C files generated by cython were
 enormous and difficult to read. They really weren't meant for human
 consumption. As Sturla has said, regardless of the quality of the
 current product, it isn't stable.

Sturla represents only himself on this issue. Cython is widely held as a
successful and very useful tool. Many more projects in the scipy community
uses cython compared to C++.

And even if it looks friendly
 there's magic going on under the hood. Magic means it's hard to
 diagnose and fix problems. At least one very smart person has told me
 they find cython most useful for wrapping C/C++ libraries and exposing
 them to python, which is a far cry from library writing. (Of course
 Wes McKinney, a cython evangelist, uses it all over his pandas
 library.)

I am not very smart, but this is certainly close to what I had in mind as
well :) As you know, the lack of clear abstraction between c and c python
wrapping is one of the major issue in numpy. Cython is certainly one of the
most capable tool out there to avoid tedious reference bug chasing.


 In comparison, there are a number of high quality, performant,
 open-source C++ based array libraries out there with very friendly
 API's. Things like eigen
 (http://eigen.tuxfamily.org/index.php?title=Main_Page) and Armadillo
 (http://arma.sourceforge.net/). They seem to have plenty of users and
 more devs than

eigen is a typical example of code i hope numpy will never be close to.
This is again quite subjective, but it also shows that we have quite
different ideas on what maintainable/readable code means. Which is of
course quite alright. But it means a choice needs to be made. If a majority
of people find eigen more readable than a well written C library, then I
don't think anyone can reasonably argue against going to c++.


 On the broader topic of recruitment...sure, cython has a lower barrier
 to entry than C++. But there are many, many more C++ developers and
 resources out there than cython resources. And it likely will stay
 that way for quite some

I may not have explained it very well: my whole point is that we don't
recruite people, where I understand recruit as hiring full time,
profesional programmers.We need more people who can casually spend a few
hours - typically grad students, scientists with an itch. There is no doubt
that more professional programmers know c++ compared to C. But a community
project like numpy has different requirements than a professional project.

David

 -Chris
 
 
 
 
 
 
 
 
 
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread David Cournapeau
Le 18 févr. 2012 11:25, Robert Kern robert.k...@gmail.com a écrit :

 On Sat, Feb 18, 2012 at 04:54, Charles R Harris
 charlesr.har...@gmail.com wrote:

  I found this , which references 0mq (used by ipython) as an example of
a C++
  library with a C interface. It seems enums can have different sizes in
  C/C++, so that is something to watch.

 One of the ways they manage to do this is by scrupulously avoiding
 exceptions even in the internal, never-touches-C zone.

I took a superficial look at zeromq 2.x sources: it looks like they don't
use much of the stl (beyond vector and some trivial usages of algorithm). I
wonder if this is linked ?

FWIW, I would be fine with using such a subset in numpy.

David
 --
 Robert Kern

 I have come to believe that the whole world is an enigma, a harmless
 enigma that is made terrible by our own mad attempt to interpret it as
 though it had an underlying truth.
   -- Umberto Eco
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread David Cournapeau
On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett matthew.br...@gmail.com
 wrote:

 Hi,

 On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett
  matthew.br...@gmail.com
  wrote:
 
  Hi.
 
  On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire
  cjord...@uw.edu wrote:
   On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett
   matthew.br...@gmail.com wrote:
   Hi,
  
   On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire
   cjord...@uw.edu wrote:
   On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden stu...@molden.no
   wrote:
  
  
   Den 18. feb. 2012 kl. 05:01 skrev Jason Grout
   jason-s...@creativetrax.com:
  
   On 2/17/12 9:54 PM, Sturla Molden wrote:
   We would have to write a C++ programming tutorial that is based
   on
   Pyton knowledge instead of C knowledge.
  
   I personally would love such a thing.  It's been a while since I
   did
   anything nontrivial on my own in C++.
  
  
   One example: How do we code multiple return values?
  
   In Python:
   - Return a tuple.
  
   In C:
   - Use pointers (evilness)
  
   In C++:
   - Return a std::tuple, as you would in Python.
   - Use references, as you would in Fortran or Pascal.
   - Use pointers, as you would in C.
  
   C++ textbooks always pick the last...
  
   I would show the first and the second method, and perhaps
   intentionally forget the last.
  
   Sturla
  
  
   On the flip side, cython looked pretty...but I didn't get the
   performance gains I wanted, and had to spend a lot of time figuring
   out if it was cython, needing to add types, buggy support for
   numpy,
   or actually the algorithm.
  
   At the time, was the numpy support buggy?  I personally haven't had
   many problems with Cython and numpy.
  
  
   It's not that the support WAS buggy, it's that it wasn't clear to me
   what was going on and where my performance bottleneck was. Even after
   microbenchmarking with ipython, using timeit and prun, and using the
   cython code visualization tool. Ultimately I don't think it was
   cython, so perhaps my comment was a bit unfair. But it was
   unfortunately difficult to verify that. Of course, as you say,
   diagnosing and solving such issues would become easier to resolve
   with
   more cython experience.
  
   The C files generated by cython were
   enormous and difficult to read. They really weren't meant for human
   consumption.
  
   Yes, it takes some practice to get used to what Cython will do, and
   how to optimize the output.
  
   As Sturla has said, regardless of the quality of the
   current product, it isn't stable.
  
   I've personally found it more or less rock solid.  Could you say
   what
   you mean by it isn't stable?
  
  
   I just meant what Sturla said, nothing more:
  
   Cython is still 0.16, it is still unfinished. We cannot base NumPy
   on
   an unfinished compiler.
 
  Y'all mean, it has a zero at the beginning of the version number and
  it is still adding new features?  Yes, that is correct, but it seems
  more reasonable to me to phrase that as 'active development' rather
  than 'unstable', because they take considerable care to be backwards
  compatible, have a large automated Cython test suite, and a major
  stress-tester in the Sage test suite.
 
 
  Matthew,
 
  No one in their right mind would build a large performance library using
  Cython, it just isn't the right tool. For what it was designed for -
  wrapping existing c code or writing small and simple things close to
  Python
  - it does very well, but it was never designed for making core C/C++
  libraries and in that role it just gets in the way.

 I believe the proposal is to refactor the lowest levels in pure C and
 move the some or most of the library superstructure to Cython.


 Go for it.

The proposal of moving to a core C + cython has been discussed by
multiple contributors. It is certainly a valid proposal. *I* have
worked on this (npymath, separate compilation), although certainly not
as much as I would have wanted to. I think much can be done in that
vein. Using the shut up if you don't do it is a straw man (and
uncalled for).

Moving away from subjective considerations on how to do things, is
there a way that one can see the pros/cons of each approach. For the
C++ approach, I would really like to see which C++ is being
considered. I was. Once the choice is done, going back would be quite
hard, so I can't see how we could go for it just because some people
prefer it without very clear technical arguments.

Saying that C++ is more readable, or scale better are frankly very
weak and too subjective to be convincing. There are too many projects
way more complex than numpy that have been done in either C or C++.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org

Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread David Cournapeau
On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 Well, we already have code obfuscation (DOUBLE_your_pleasure,
 FLOAT_your_boat), so we might as well let the compiler handle it.

Yes, those are not great, but on the other hand, it is not that a
fundamental issue IMO.

Iterators as we have it in NumPy is something that is clearly limited
by C. Writing the neighborhood iterator is the only case where I
really felt that C++ *could* be a significant improvement. I use
*could* because writing iterator in C++ is hard, and will be much
harder to read (I find both boost and STL - e.g. stlport -- iterators
to be close to write-only code). But there is the question on how you
can make C++-based iterators available in C. I would be interested in
a simple example of how this could be done, ignoring all the other
issues (portability, exception, etc…).

The STL is also potentially compelling, but that's where we go into my
beware of the dragons area of C++. Portability loss, compilation
time increase and warts are significant there.
scipy.sparse.sparsetools has been a source of issues that was quite
high compared to its proportion of scipy amount code (we *do* have
some hard-won experience on C++-related issues).


 Jim Hugunin was a keynote speaker at one of the scipy conventions. At dinner
 he said that if he was to do it again he would use managed code ;) I don't
 propose we do that, but tools do advance.

In an ideal world, we would have a better language than C++ that can
be spit out as C for portability. I have looked for a way to do this
for as long as I have been contributing to NumPy (I have looked at
ooc, D, coccinelle at various stages). I believe the best way is
actually in the vein of FFTW: written in a very high level language
(OCAML) for the hard part, and spitting out C. This is better than C++
is many ways - this is also clearly not realistic :)

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-18 Thread David Cournapeau
On Sat, Feb 18, 2012 at 10:50 PM, Sturla Molden stu...@molden.no wrote:

   In an ideal world, we would have a better language than C++ that can
 be spit out as  C for portability.

 What about a statically typed Python? (That is, not Cython.) We just
 need to make the compiler :-)

There are better languages than C++ that has most of the technical
benefits stated in this discussion (rust and D being the most
obvious ones), but whose usage is unrealistic today for various
reasons: knowledge, availability on esoteric platforms, etc… A new
language is completely ridiculous.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread David Cournapeau
Hi Mark,

thank you for joining this discussion.

On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe mwwi...@gmail.com wrote:
 The suggestion of transitioning the NumPy core code from C to C++ has
 sparked a vigorous debate, and I thought I'd start a new thread to give my
 perspective on some of the issues raised, and describe how such a transition
 could occur.

 First, I'd like to reiterate the gcc rationale for their choice to switch:
 http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale

 In particular, these points deserve emphasis:

 The C subset of C++ is just as efficient as C.
 C++ supports cleaner code in several significant cases.
 C++ makes it easier to write cleaner interfaces by making it harder to break
 interface boundaries.
 C++ never requires uglier code.

I think those arguments will not be very useful: they are subjective,
and unlikely to convince people who prefer C to C++.


 There are concerns about ABI/API interoperability and interactions with C++
 exceptions. I've dealt with these types of issues on enough platforms to
 know that while they're important, they're a lot easier to handle than the
 issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that
 providing a C API from a C++ library is no harder than providing a C API
 from a C library.

This needs more details. I have some experience in both areas as well,
and mine is quite different. Reiterating a few examples that worry me:
  - how can you ensure that exceptions happening in C++ will never
cross different .so/.dll ? How can one make sure C++ extensions built
by different compilers can work ? Is not using exceptions like it is
done in zeromq acceptable ? (would be nice to find out more about the
decisions made by the zeromq team about their usage of C++). I cannot
find a recent example, but I have seen errors similar to
this(http://software.intel.com/en-us/forums/showthread.php?t=42940)
quite a few times.
  - how can you expose in C some heavily-using C++ features ? I would
expect you would like to use templates for iterators in numpy - you
can you make them available to 3rd party extensions without requiring
C++.


 It's worth comparing the possibility of C++ versus the possibility of other
 languages, and the ones that have been suggested for consideration are D,
 Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language
 has to interact naturally with the CPython API. It needs to provide direct
 access to all the various sizes of signed int, unsigned int, and float. It
 needs to have mature compiler support wherever we want to deploy NumPy.
 Taken together, these requirements eliminate a majority of these
 possibilities. From these criteria, the only languages which seem to have a
 clear possibility for the implementation of Numpy are C, C++, and D. For D,
 I suspect the tooling is not mature enough, but I'm not 100% certain of
 that.

While I agree that no other language is realistic, staying in C has
the nice advantage that we can more easily use one of them if they
mature (rust/D - go, rpython, C#/java can be dismissed for fundamental
technical reasons right away). This is not a very strong argument
against using C++, obviously.


 1) Immediately after branching for 1.7, we minimally patch all the .c files
 so that they can build with a C++ compiler and with a C compiler at the same
 time. Then we rename all .c - .cpp, and update the build systems for C++.
 2) During the 1.8 development cycle, we heavily restrict C++ feature usage.
 But, where a feature implementation would be arguably easier and less
 error-prone with C++, we allow it. This is a period for learning about C++
 and how it can benefit NumPy.
 3) After the 1.8 release, the community will have developed more experience
 with C++, and will be in a better position to discuss a way forward.

A step that would be useful sooner rather than later is one where
numpy has been split into smaller extensions (instead of
multiarray/ufunc, essentially). This would help avoiding recompilation
of lots of code for any small change. It is already quite painful with
C, but with C++, it will be unbearable. This can be done in C, and
would be useful whether the decision to move to C++ is accepted or
not.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread David Cournapeau
On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe mwwi...@gmail.com wrote:
 On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau courn...@gmail.com
 wrote:

 On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:

 
  Well, we already have code obfuscation (DOUBLE_your_pleasure,
  FLOAT_your_boat), so we might as well let the compiler handle it.

 Yes, those are not great, but on the other hand, it is not that a
 fundamental issue IMO.

 Iterators as we have it in NumPy is something that is clearly limited
 by C. Writing the neighborhood iterator is the only case where I
 really felt that C++ *could* be a significant improvement. I use
 *could* because writing iterator in C++ is hard, and will be much
 harder to read (I find both boost and STL - e.g. stlport -- iterators
 to be close to write-only code). But there is the question on how you
 can make C++-based iterators available in C. I would be interested in
 a simple example of how this could be done, ignoring all the other
 issues (portability, exception, etc…).

 The STL is also potentially compelling, but that's where we go into my
 beware of the dragons area of C++. Portability loss, compilation
 time increase and warts are significant there.
 scipy.sparse.sparsetools has been a source of issues that was quite
 high compared to its proportion of scipy amount code (we *do* have
 some hard-won experience on C++-related issues).


 These standard library issues were definitely valid 10 years ago, but all
 the major C++ compilers have great C++98 support now.

STL varies significantly between platforms, I believe it is still the
case today. Do you know the status of the STL on bluegen, on small
devices ? We unfortunately cannot restrict ourselves to one well known
implementation (e.g. STLPort).

 Is there a specific
 target platform/compiler combination you're thinking of where we can do
 tests on this? I don't believe the compile times are as bad as many people
 suspect, can you give some simple examples of things we might do in NumPy
 you expect to compile slower in C++ vs C?

Switching from gcc to g++ on the same codebase should not change much
compilation times. We should test, but that's not what worries me.
What worries me is when we start using C++ specific code, STL and co.
Today, scipy.sparse.sparsetools takes half of the build time  of the
whole scipy, and it does not even use fancy features. It also takes Gb
of ram when building in parallel.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread David Cournapeau
On Sun, Feb 19, 2012 at 9:19 AM, Mark Wiebe mwwi...@gmail.com wrote:
 On Sun, Feb 19, 2012 at 2:56 AM, David Cournapeau courn...@gmail.com
 wrote:

 Hi Mark,

 thank you for joining this discussion.

 On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe mwwi...@gmail.com wrote:
  The suggestion of transitioning the NumPy core code from C to C++ has
  sparked a vigorous debate, and I thought I'd start a new thread to give
  my
  perspective on some of the issues raised, and describe how such a
  transition
  could occur.
 
  First, I'd like to reiterate the gcc rationale for their choice to
  switch:
  http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
 
  In particular, these points deserve emphasis:
 
  The C subset of C++ is just as efficient as C.
  C++ supports cleaner code in several significant cases.
  C++ makes it easier to write cleaner interfaces by making it harder to
  break
  interface boundaries.
  C++ never requires uglier code.

 I think those arguments will not be very useful: they are subjective,
 and unlikely to convince people who prefer C to C++.


 They are arguments from a team which implement both a C and a C++ compiler.
 In the spectrum of possible authorities on the matter, they rate about as
 high as I can imagine.

There are quite a few arguments who are as authoritative and think
those arguments are not very strong. They are as unlikely to change
your mind as the gcc's arguments are unlikely to convince me I am
afraid.


 This is a necessary part of providing a C API, and is included as a
 requirement of doing that. All C++ libraries which expose a C API deal with
 this.

The only two given examples given so far for a C library around C++
code (clang and zeromq) do not use exceptions. Can you provide an
example of a C++ library that has a C API and does use exception ?

If not, I would like to know the technical details if you don't mind
expanding on them.



 How can one make sure C++ extensions built
 by different compilers can work ?


 This is no different from the situation in C. Already in C on Windows, one
 can't build NumPy with a different version of Visual C++ than the one used
 to build CPython.

This is a different situation. On windows, the mismatch between VS is
due to the way win32 has been used by python itself - it could
actually be fixed eventually by python (there are efforts in that
regard). It is not a language issue.

Except for that case, numpy has a pretty good record of allowing
people to mix and match compilers. Using mingw on windows and intel
compilers on linux are the typical cases, but not the only ones.


 I would
 expect you would like to use templates for iterators in numpy - you
 can you make them available to 3rd party extensions without requiring
 C++.


 Yes, something like the nditer is a good example. From C, it would have to
 retain an API in the current style, but C++ users could gain an
 easier-to-use variant.

Providing an official C++ library on top of the current C API would
certainly be nice for people who prefer C++ to C. But this is quite
different from using C++ at the core.

The current way iterators work would be very hard (if at all possible
?) to rewrite in idiomatic in C++ while keeping even API compatibility
with the existing C one. For numpy 2.0, we can somehow relax on this.
If it is not too time consuming, could you show a simplified example
of how it would work to write the iterator in C++ while providing a C
API in the spirit of what we have now ?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposed Roadmap Overview

2012-02-19 Thread David Cournapeau
On Sun, Feb 19, 2012 at 9:28 AM, Mark Wiebe mwwi...@gmail.com wrote:

 Is there anyone who uses a blue gene or small device which needs up-to-date
 numpy support, that I could talk to directly? We really need a list of
 supported platforms on the numpy wiki we can refer to when discussing this
 stuff, it all seems very nebulous to me.

They may not need an up to date numpy version now, but if stopping
support for them is a requirement for C++, it must be kept in mind. I
actually suspect Travis to have more details on the big iron side of
things. On the small side of things:
http://projects.scipy.org/numpy/ticket/1969

This may seem like not very useful - but that's part of what a open
source project is all about in my mind.


 Particular styles of using templates can cause this, yes. To properly do
 this kind of advanced C++ library work, it's important to think about the
 big-O notation behavior of your template instantiations, not just the big-O
 notation of run-time. C++ templates have a turing-complete language (which
 is said to be quite similar to haskell, but spelled vastly different)
 running at compile time in them. This is what gives template
 meta-programming in C++ great power, but since templates weren't designed
 for this style of programming originally, template meta-programming is not
 very easy.

scipy.sparse.sparsetools is quite straightforward in its usage of
templates (would be great if you could suggest improvement BTW, e.g.
scipy/sparse/sparsetools/csr.h), and does not by itself use any
meta-template programming.

I like that numpy can be built in a few seconds (at least without
optimization), and consider this to be a useful feature.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread David Cournapeau
On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe mwwi...@gmail.com wrote:
 On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh ben_w_...@yahoo.co.uk wrote:



  Date: Sun, 19 Feb 2012 01:18:20 -0600
  From: Mark Wiebe mwwi...@gmail.com
  Subject: [Numpy-discussion] How a transition to C++ could work
  To: Discussion of Numerical Python NumPy-Discussion@scipy.org
  Message-ID:
 
  CAMRnEmpVTmt=kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com
  Content-Type: text/plain; charset=utf-8
 
  The suggestion of transitioning the NumPy core code from C to C++ has
  sparked a vigorous debate, and I thought I'd start a new thread to give
  my
  perspective on some of the issues raised, and describe how such a
  transition could occur.
 
  First, I'd like to reiterate the gcc rationale for their choice to
  switch:
  http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
 
  In particular, these points deserve emphasis:
 
    - The C subset of C++ is just as efficient as C.
    - C++ supports cleaner code in several significant cases.
    - C++ makes it easier to write cleaner interfaces by making it harder
  to
    break interface boundaries.
    - C++ never requires uglier code.
 

 I think they're trying to solve a different problem.

 I thought the problem that numpy was trying to solve is make inner loops
 of numerical algorithms very fast. C is great for this because you can
 write C code and picture precisely what assembly code will be generated.


 What you're describing is also the C subset of C++, so your experience
 applies just as well to C++!


 C++ removes some of this advantage -- now there is extra code generated by
 the compiler to handle constructors, destructors, operators etc which can
 make a material difference to fast inner loops. So you end up just writing
 C-style anyway.


 This is in fact not true, and writing in C++ style can often produce faster
 code. A classic example of this is C qsort vs C++ std::sort. You may be
 thinking of using virtual functions in a class hierarchy, where a tradeoff
 between performance and run-time polymorphism is being done. Emulating the
 functionality that virtual functions provide in C will give similar
 performance characteristics as the C++ language feature itself.


 On the other hand, if your problem really is write lots of OO code with
 virtual methods and have it turned into machine code (probably like the
 GCC guys) then maybe C++ is the way to go.


 Managing the complexity of the dtype subsystem, the ufunc subsystem, the
 nditer component, and other parts of NumPy could benefit from C++ Not in a
 stereotypical OO code with virtual methods way, that is not how typical
 modern C++ is done.


 Some more opinions on C++:
 http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/

 Sorry if this all seems a bit negative about C++. It's just been my
 experience that C++ adds complexity while C keeps things nice and simple.


 Yes, there are lots of negative opinions about C++ out there, it's true.
 Just like there are negative opinions about C, Java, C#, and any other
 language which has become popular. My experience with regard to complexity
 and C vs C++ is that C forces the complexity of dealing with resource
 lifetimes out into all the code everyone writes, while C++ allows one to
 encapsulate that sort of complexity into a class which is small and more
 easily verifiable. This is about code quality, and the best quality C++ code
 I've worked with has been way easier to program in than the best quality C
 code I've worked with.

While I actually believe this to be true (very good C++ can be easier
to read/use than very good C). Good C is also much more common than
good C++, at least in open source.

On the good C++ codebases you have been working on, could you rely on
everybody being a very good C++ programmer ? Because this will most
likely never happen for numpy. This is the crux of the argument from
an organizational POV: the variance in C++ code quality is much more
difficult to control. I have seen C++ code that is certainly much
poorer and more complex than numpy, to a point where not much could be
done to save the codebase.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Numpy] quadruple precision

2012-02-29 Thread David Cournapeau
On Wed, Feb 29, 2012 at 10:22 AM, Paweł Biernat pw...@wp.pl wrote:
 I am completely new to Numpy and I know only the basics of Python, to
 this point I was using Fortran 03/08 to write numerical code. However,
 I am starting a new large project of mine and I am looking forward to
 using Python to call some low level Fortran code responsible for most
 of the intensive number crunching. In this context I stumbled into
 f2py and it looks just like what I need, but before I start writing an
 app in mixture of Python and Fortran I have a question about numerical
 precision of variables used in numpy and f2py.

 Is there any way to interact with Fortran's real(16) (supported by gcc
 and Intel's ifort) data type from numpy? By real(16) I mean the
 binary128 type as in IEEE 754. (In C this data type is experimentally
 supported as __float128 (gcc) and _Quad (Intel's icc).) I have
 investigated the float128 data type, but it seems to work as binary64
 or binary80 depending on the architecture. If there is currently no
 way to interact with binary128, how hard would it be to patch the
 sources of numpy to add such data type? I am interested only in
 basic stuff, comparable in functionality to libmath.

 As said before, I have little knowledge of Python, Numpy and f2py, I
 am however, interested in investing some time in learing it and
 implementing the mentioned features, but only if there is any hope of
 succeeding.

Numpy does not have proper support for the quadruple precision float
numbers, because very few implementation do (no common CPU handle it
in hw, for example).

The dtype128 is a bit confusingly named: the 128 refers to the padding
in memory, but not its real precision. It often (but not always)
refer to the long double in the underlying C implementation. The
latter depends on the OS, CPU and compilers.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] C++ Example

2012-03-03 Thread David Cournapeau
On Sat, Mar 3, 2012 at 8:07 AM, Luis Pedro Coelho l...@cmu.edu wrote:
 Hi,

 I sort of missed the big C++ discussion, but I'd like to give some examples of
 how writing code can become much simpler if you are based on C++. This is from
 my mahotas package, which has a thin C++ wrapper around numpy's C API

 https://github.com/luispedro/mahotas/blob/master/mahotas/_morph.cpp

 and it implements multi-type greyscale erosion.


 // numpy::aligned_array wraps PyArrayObject*
 templatetypename T
 void erode(numpy::aligned_arrayT res,
                                numpy::aligned_arrayT array,
                                numpy::aligned_arrayT Bc) {


        // Release the GIL using RAII
    gil_release nogil;
    const int N = res.size();
    typename numpy::aligned_arrayT::iterator iter = array.begin();
    // this is adapted from scipy.ndimage.
    // it implements the convolution-like filtering.
    filter_iteratorT filter(res.raw_array(),
                                                                               
  Bc.raw_array(),
                                                                               
  EXTEND_NEAREST,
                                                                               
  is_bool(T()));
    const int N2 = filter.size();
    T* rpos = res.data();

    for (int i = 0; i != N; ++i, ++rpos, filter.iterate_both(iter)) {
        T value = std::numeric_limitsT::max();
        for (int j = 0; j != N2; ++j) {
            T arr_val = T();
            filter.retrieve(iter, j, arr_val);
            value = std::minT(value, erode_sub(arr_val, filter[j]));
        }
        *rpos = value;
    }
 }

 If you compare this with the equivalent scipy.ndimage function, which is very
 good C code (but mostly write-only—in fact, ndimage has not been maintainable
 because it is so hard [at least for me, I've tried]):

The fact that this is good C is matter of opinon :)

I don't think the code is comparable either - some of the stuff done
in the C code is done in the C++ code your are calling. The C code
could be significantly improved. Even more important here: almost none
of this code should be written anymore anyway, C++ or not. This is
really the kind of code that should be done in cython, as it is mostly
about wrapping C code into the python C API.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] dtype comparison, hash

2012-03-05 Thread David Cournapeau
On Tue, Jan 17, 2012 at 9:28 AM, Robert Kern robert.k...@gmail.com wrote:
 On Tue, Jan 17, 2012 at 05:11, Andreas Kloeckner
 li...@informa.tiker.net wrote:
 Hi Robert,

 On Fri, 30 Dec 2011 20:05:14 +, Robert Kern robert.k...@gmail.com 
 wrote:
 On Fri, Dec 30, 2011 at 18:57, Andreas Kloeckner
 li...@informa.tiker.net wrote:
  Hi Robert,
 
  On Tue, 27 Dec 2011 10:17:41 +, Robert Kern robert.k...@gmail.com 
  wrote:
  On Tue, Dec 27, 2011 at 01:22, Andreas Kloeckner
  li...@informa.tiker.net wrote:
   Hi all,
  
   Two questions:
  
   - Are dtypes supposed to be comparable (i.e. implement '==', '!=')?
 
  Yes.
 
   - Are dtypes supposed to be hashable?
 
  Yes, with caveats. Strictly speaking, we violate the condition that
  objects that equal each other should hash equal since we define == to
  be rather free. Namely,
 
    np.dtype(x) == x
 
  for all objects x that can be converted to a dtype.
 
    np.dtype(float) == np.dtype('float')
    np.dtype(float) == float
    np.dtype(float) == 'float'
 
  Since hash(float) != hash('float') we cannot implement
  np.dtype.__hash__() to follow the stricture that objects that compare
  equal should hash equal.
 
  However, if you restrict the domain of objects to just dtypes (i.e.
  only consider dicts that use only actual dtype objects as keys instead
  of arbitrary mixtures of objects), then the stricture is obeyed. This
  is a useful domain that is used internally in numpy.
 
  Is this the problem that you found?
 
  Thanks for the reply.
 
  It doesn't seem like this is our issue--instead, we're encountering two
  different dtype objects that claim to be float64, compare as equal, but
  don't hash to the same value.
 
  I've asked the user who encountered the user to investigate, and I'll
  be back with more detail in a bit.

 I think we've run into this before and tried to fix it. Try to find
 the version of numpy the user has and a minimal example, if you can.

 This is what Thomas found:

 http://projects.scipy.org/numpy/ticket/2017

 It looks like the .flags attribute is different between np.uintp and
 np.uint32. The .flags attribute forms part of the hashed information
 about the dtype (or PyArray_Descr at the C-level).

 [~]
 |15 np.dtype(np.uintp).flags
 1536

 [~]
 |16 np.dtype(np.uint32).flags
 2048

 The same goes for np.intp and np.int32 in numpy 1.6.1 on OS X, so
 unlike the comment in the ticket, they do have different hashes for
 me.

 However, diving through the source a bit, I'm not entirely sure I
 trust the values being given at the Python level. It appears that the
 flag member of the PyArray_Descr struct is declared as a char.
 However, it is exposed as a T_INT member in the PyMemberDef table by
 direct addressing. Basically, a Python descriptor gets added to the
 np.dtype type that will look up sizeof(long) bytes from the starting
 position of the flags member in the struct. This includes 3 bytes of
 the following type_num member. Obviously, 2048 does not fit into a
 char. Nonetheless, the type_num is also part of the hash, so either
 the flags member or the type_num member is different between the two.

 Two bugs for the price of one!

Good catch !

So basically, the flag was changed from a char to an int back to a
char, and some of the code did not follow.

I could not really follow the exact history from the log alone, but basically:
  - there is indeed a char vs int discrepency (T_INT vs char)
  - in most dtype functions handling the flag variable, temporary
computation were made with an int (but every possible flag combination
can fit in a char)
  - quite a few usage of i instead of c in PyArg_ParseTuple and
PyBuild_Value.

Even after all those things, the original bug is there, because uintp
and uin32 have different typenum, even in 32 bits. I would actually
consider this a big in PyArray_EquivTypes, but changing this now may
be quite disrupting. Shall I remove type_num from the hash input (in
which case the bug would be fixed) ?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fixing PyArray_Descr flags member size, ABI vs pickling issue

2012-03-06 Thread David Cournapeau
On Tue, Mar 6, 2012 at 6:20 AM, Robert Kern robert.k...@gmail.com wrote:
 On Tue, Mar 6, 2012 at 03:53, David Cournapeau courn...@gmail.com wrote:
 Hi,

 This is following the discussion on bug
 http://projects.scipy.org/numpy/ticket/2017

 Essentially, there is a discrepency between the actual type of the
 flags member in the dtype C structure (char) and how it is declared in
 the descriptor table (T_INT). The problem is that we are damned if we
 fix it, damned if we are not:
  - fixing T_INT to T_BYTE. flag in python is now fixed, but it breaks
 pickled numpy arrays

 Is the problem that T_BYTE returns a single-item string?

Yes (although it is actually what we want, instead of an int).

 My handwrapping skills are rusty (Cython has blissfully eradicated this
 from my memory), but aren't these T_INT/T_BYTE things just convenient
 shortcuts for exposing C struct members as Python attributes? Couldn't
 we just write a full getter function for returning the correct value,
 just returned as a Python int instead of a str.

You're right, I did not think about this solution. That's certainly
better than the two I suggested.

Thanks,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fixing PyArray_Descr flags member size, ABI vs pickling issue

2012-03-06 Thread David Cournapeau
On Tue, Mar 6, 2012 at 1:25 PM, Travis Oliphant tra...@continuum.io wrote:
 Why do we want to return a single string char instead of an int?

 There is a need for more flags on the dtype object.   Using an actual 
 attribute call seems like the way to go.  This could even merge the contents 
 of two struct members so that we can add more flags but preserve ABI 
 compatibility.

Yes. The T_BYTE/T_INT is actually pretty  minor compared to the
underlying issue (where we cast back and forth between int and char).
I will make a new PR that fixes everything but this exact point, and
will put an actual accessor if needed. Given that dtype.flags is
nonsensical as of today (at the python level), I would expect nobody
uses it.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] (2012) Accessing LAPACK and BLAS from the numpy C API

2012-03-06 Thread David Cournapeau
On Tue, Mar 6, 2012 at 2:57 PM, Sturla Molden stu...@molden.no wrote:
 On 05.03.2012 14:26, V. Armando Solé wrote:

 In 2009 there was a thread in this mailing list concerning the access to
 BLAS from C extension modules.

 If I have properly understood the thread:

 http://mail.scipy.org/pipermail/numpy-discussion/2009-November/046567.html

 the answer by then was that those functions were not exposed (only f2py
 functions).

 I just wanted to know if the situation has changed since 2009 because it
 is not uncommon that to optimize some operations one has to sooner or
 later access BLAS functions that are already wrapped in numpy (either
 from ATLAS, from the Intel MKL, ...)

 Why do you want to do this? It does not make your life easier to use
 NumPy or SciPy's Python wrappers from C. Just use BLAS directly from C
 instead.

Of course it does make his life easier. This way he does not have to
distribute his own BLAS/LAPACK/etc...

Please stop presenting as truth things which are at best highly
opiniated. You already made such statements many times, and it is not
helpful at all.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fixing PyArray_Descr flags member size, ABI vs pickling issue

2012-03-07 Thread David Cournapeau
On Tue, Mar 6, 2012 at 1:44 PM, Robert Kern robert.k...@gmail.com wrote:

 On Tue, Mar 6, 2012 at 18:25, Travis Oliphant tra...@continuum.io wrote:
  Why do we want to return a single string char instead of an int?

 I suspect just to ensure that any provided value fits in the range
 0..255. But that's easily done explicitly.


That was not even the issue in the end, my initial analysis was wrong.

In any case, I have now a new PR that fixes both dtypes.flags value and
dtype hashing reported in #2017: https://github.com/numpy/numpy/pull/231

regards,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] float96 on windows32 is float64?

2012-03-15 Thread David Cournapeau
On Thu, Mar 15, 2012 at 11:10 PM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 Am I right in thinking that float96 on windows 32 bit is a float64
 padded to 96 bits?


Yes


  If so, is it useful?


Yes: this is what allows you to use dtype to parse complex binary files
directly in numpy without having to care so much about those details. And
that's how it is defined on windows in any case (C standard only forces you
to have sizeof(long double) = sizeof(double)).



  Has anyone got a windows64
 box to check float128 ?


Too lazy to check on my vm, but I am pretty sure it is 16 bytes on windows
64.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Linking against MKL but still slow?

2012-03-26 Thread David Cournapeau
Hi Christoph,

On Mon, Mar 26, 2012 at 10:06 AM, Christoph Dann
ch.ro.d...@googlemail.comwrote:

 Dear list,

 so far I used Enthoughts Python Distribution which contains a compiled
 version of numpy linked against MKL. Now, I want to implement my own
 extensions to numpy, so I need my build numpy on my own. So, I installed
 Intel Parallel studio including MKL and the C / Fortran compilers.


What do you mean by own extensions to NumPy ? If you mean building
extensions against the C API of NumPy, then you don't need to build your
own NumPy. Building NumPy with Intel Compilers and MKL is a non-trivial
process, so I would rather avoid it.

If you still want to build it by yourself, could you give us the full
output of your build ?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [ANN] Bento 0.0.8.1

2012-04-01 Thread David Cournapeau
Hi,

I am pleased to announce a new release of bento, a packaging solution
for python which aims at reproducibility, extensibility and simplicity.

The main features of this 0.0.8.1 release are:

- Path sections can now use conditionals
- More reliable convert command to migrate
distutils/setuptools/distribute/distutils2 setup.py to bento
- Single-file distribution can now include waf itself
- Nose is not necessary to run the test suite anymore
- Significant improvements to the distutils compatibility layer
- LibraryDir support for backward compatibility with distutils packages
relying on the package_dir feature

Bento source code can be found on github: https://github.com/cournape/Bento
Bento documentation is there as well: https://cournape.github.com/Bento

regards,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG

2012-04-02 Thread David Cournapeau
On Sun, Apr 1, 2012 at 2:28 PM, Kamesh Krishnamurthy kames...@gmail.comwrote:

 Hello all,

 I profiled NumPy EIG and MATLAB EIG on the same Macbook pro, and both were
 linking to the Accelerate framework BLAS. NumPy turns out to be ~4x slower.
 I've posted details on Stackoverflow:
 http://stackoverflow.com/q/9955021/974568

 Can someone please let me know the reason for the performance gap?


I would look at two things:

   - first, are you sure matlab is not using the MKL instead of accelerate
framework ? I have not used matlab in ages, but you should be able to check
by using otool -L to some of the core dll of matlab, to find out which
libraries are linked to it
   - second, it could be that matlab eig and numpy eig don't use the same
underlying lapack API (do they give you the same result ?). This would
already be a bit harder to check, unless it is documented explicitly in
matlab.

regards,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG

2012-04-02 Thread David Cournapeau
On Mon, Apr 2, 2012 at 4:45 PM, Chris Barker chris.bar...@noaa.gov wrote:

 On Mon, Apr 2, 2012 at 2:25 AM, Nathaniel Smith n...@pobox.com wrote:
  To see if this is an effect of numpy using C-order by default instead of
  Fortran-order, try measuring eig(x.T) instead of eig(x)?

 Just to be clear, .T re-arranges the strides (Making it Fortran
 order), butyou'll have to make sure your ariginal data is the
 transpose of whatyou want.

 I posted this on slashdot, but for completeness:

 the code posted on slashdot is also profiling the random number
 generation -- I have no idea how numpy and MATLAB's random number
 generation compare, nor how random number generation compares to
 eig(), but you should profile them independently to make sure.


While this is true, the cost is most likely negligeable compared to the
cost of eig (unless something weird is going on in random as well).

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] YouTrack testbed

2012-04-10 Thread David Cournapeau
On Tue, Apr 10, 2012 at 8:40 PM, Ralf Gommers
ralf.gomm...@googlemail.comwrote:



 On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven bry...@continuum.iowrote:

 On 4/3/12 4:18 PM, Ralf Gommers wrote:
  Here some first impressions.
 
  The good:
  - It's responsive!
  - It remembers my preferences (view type, # of issues per page, etc.)
  - Editing multiple issues with the command window is easy.
  - Search and filter functionality is powerful
 
  The bad:
  - Multiple projects are supported, but issues are then really mixed.
  The way this works doesn't look very useful for combined admin of
  numpy/scipy trackers.
  - I haven't found a way yet to make versions and subsystems appear in
  the one-line issue overview.
  - Fixed issues are still shown by default. There are several open
  issues filed against youtrack about this, with no reasonable answers.
  - Plain text attachments (.txt, .diff, .patch) can't be viewed, only
  downloaded.
  - No direct VCS integration, only via Teamcity (not set up, so can't
  evaluate).
  - No useful default views as in Trac
  (http://projects.scipy.org/scipy/report).

 Ralf,  regarding some of the issues:


 Hi Bryan, thanks for looking into this.


 I think for numpy/scipy trackers, we could simply run separate instances
 of YouTrack for each.


 That would work. It does mean that there's no maintenance advantage over
 using Trac here.

 Also we can certainly create some standard
 queries. It's a small pain not to have useful defaults, but it's only a
 one-time pain. :)


 That should help.


 Also, what kind of integration are you looking for with github? There
 does appear to be the ability to issue commands to youtrack through git
 commits, which does not depend on TeamCity, as best I can tell:

 http://confluence.jetbrains.net/display/YTD3/GitHub+Integration
 http://blogs.jetbrains.com/youtrack/tag/github-integration/

 I'm not sure this is what you were thinking about though.


 That does help. The other thing that's useful is to reference commits
 (like commit:abcd123 in current Trac) and have them turned into links to
 commits on Github. This is not a showstopper for me though.


 For the other issues, Maggie or I can try and see what we can find out
 about implementing them, or working around them, this week.


 I'd say that from the issues I mentioned, the biggest one is the one-line
 view. So these two:

   - I haven't found a way yet to make versions and subsystems appear in
 the one-line issue overview.
   - Fixed issues are still shown by default. There are several open
 issues filed against youtrack about this, with no reasonable answers.


 Of course, we'd like to evaluate any other viable issue trackers as

 well. Do you have any suggestions for other systems besides YouTrack?


 David wrote up some issues (some of which I didn't check) with current
 Trac and looked at Redmine before. He also mentioned Roundup. See
 http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow

 Redmine does look good from a quick browse (better view, does display
 diffs). It would be good to get the opinions of a few more people on this
 topic.


Redmine is trac on RoR, but it solves two significant issues over trac:
  - mass edit (e.g. moving things to a new mileston is simple and doable
from the UI)
  - REST API by default, so that we can build simple command line tools on
top of it (this changed since I made the wiki page)

It is a PITA to install, though, at least if you are not familiar with
ruby, and I heard it is hard to manage as well.

IIRC, roundup was suggested by Robert, but it is more of a custom solution
I believe.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] YouTrack testbed

2012-04-12 Thread David Cournapeau
On Thu, Apr 12, 2012 at 5:43 PM, Ralf Gommers
ralf.gomm...@googlemail.comwrote:



 On Tue, Apr 10, 2012 at 9:53 PM, David Cournapeau courn...@gmail.comwrote:



 On Tue, Apr 10, 2012 at 8:40 PM, Ralf Gommers 
 ralf.gomm...@googlemail.com wrote:



 On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven 
 bry...@continuum.iowrote:

 On 4/3/12 4:18 PM, Ralf Gommers wrote:
  Here some first impressions.
 
  The good:
  - It's responsive!
  - It remembers my preferences (view type, # of issues per page, etc.)
  - Editing multiple issues with the command window is easy.
  - Search and filter functionality is powerful
 
  The bad:
  - Multiple projects are supported, but issues are then really mixed.
  The way this works doesn't look very useful for combined admin of
  numpy/scipy trackers.
  - I haven't found a way yet to make versions and subsystems appear in
  the one-line issue overview.
  - Fixed issues are still shown by default. There are several open
  issues filed against youtrack about this, with no reasonable answers.
  - Plain text attachments (.txt, .diff, .patch) can't be viewed, only
  downloaded.
  - No direct VCS integration, only via Teamcity (not set up, so can't
  evaluate).
  - No useful default views as in Trac
  (http://projects.scipy.org/scipy/report).

 Ralf,  regarding some of the issues:


 Hi Bryan, thanks for looking into this.


 I think for numpy/scipy trackers, we could simply run separate instances
 of YouTrack for each.


 That would work. It does mean that there's no maintenance advantage over
 using Trac here.

 Also we can certainly create some standard
 queries. It's a small pain not to have useful defaults, but it's only a
 one-time pain. :)


 That should help.


 Also, what kind of integration are you looking for with github? There
 does appear to be the ability to issue commands to youtrack through git
 commits, which does not depend on TeamCity, as best I can tell:

 http://confluence.jetbrains.net/display/YTD3/GitHub+Integration
 http://blogs.jetbrains.com/youtrack/tag/github-integration/

 I'm not sure this is what you were thinking about though.


 That does help. The other thing that's useful is to reference commits
 (like commit:abcd123 in current Trac) and have them turned into links to
 commits on Github. This is not a showstopper for me though.


 For the other issues, Maggie or I can try and see what we can find out
 about implementing them, or working around them, this week.


 I'd say that from the issues I mentioned, the biggest one is the
 one-line view. So these two:

   - I haven't found a way yet to make versions and subsystems appear in
 the one-line issue overview.
   - Fixed issues are still shown by default. There are several open
 issues filed against youtrack about this, with no reasonable
 answers.


 Of course, we'd like to evaluate any other viable issue trackers as

 well. Do you have any suggestions for other systems besides YouTrack?


 David wrote up some issues (some of which I didn't check) with current
 Trac and looked at Redmine before. He also mentioned Roundup. See
 http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow

 Redmine does look good from a quick browse (better view, does display
 diffs). It would be good to get the opinions of a few more people on this
 topic.


 Redmine is trac on RoR, but it solves two significant issues over trac:
   - mass edit (e.g. moving things to a new mileston is simple and doable
 from the UI)
   - REST API by default, so that we can build simple command line tools
 on top of it (this changed since I made the wiki page)

 It is a PITA to install, though, at least if you are not familiar with
 ruby, and I heard it is hard to manage as well.


 Thanks, that's a clear description of pros and cons. It's also easy to
 play with Redmine at demo.redmine.org. That site allows you to set up a
 new project and try the admin interface.


And I just discovered this (and in python !)

https://github.com/coiled-coil/git-redmine

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] YouTrack testbed

2012-04-12 Thread David Cournapeau
On Thu, Apr 12, 2012 at 9:29 PM, william ratcliff 
william.ratcl...@gmail.com wrote:

 Has anyone tried Rietveld, Gerrit, or Phabricator?


rietveld and gerrit are code review tools. I have not heard of phabricator,
but this article certainly makes it sounds interesting:
http://www.readwriteweb.com/hack/2011/09/a-look-at-phabricator-facebook.php

There is a quite complete command line interface, arcanist, and if done
right, having code review and bug tracking integrated together sounds
exciting.

Thanks for mentioning it, I will definitely look it out.

regards,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] What is consensus anyway

2012-04-25 Thread David Cournapeau
On Wed, Apr 25, 2012 at 10:54 PM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 On Wed, Apr 25, 2012 at 2:35 PM, Travis Oliphant tra...@continuum.io
 wrote:
 
  Do you agree that Numpy has not been very successful in recruiting and
  maintaining new developers compared to its large user-base?
 
  Compared to - say - Sympy?
 
  Why do you think this is?
 
  I think it's mostly because it's infrastructure that is a means to an
 end.   I certainly wasn't excited to have to work on NumPy originally, when
 my main interest was SciPy.I've come to love the interesting plateau
 that NumPy lives on.But, I think it mostly does the job it is supposed
 to do. The fact that it is in C is also not very sexy.   It is also
 rather complicated with a lot of inter-related parts.
 
  I think NumPy could do much, much more --- but getting there is going to
 be a challenge of execution and education.
 
  You can get to know the code base.  It just takes some time and
 patience.   You also have to be comfortable with compilers and building
 software just to tweak the code.
 
 
 
  Would you consider asking that question directly on list and asking
  for the most honest possible answers?
 
  I'm always interested in honest answers and welcome any sincere
 perspective.

 Of course, there are potential explanations:

 1) Numpy is too low-level for most people
 2) The C code is too complicated
 3) It's fine already, more or less

 are some obvious ones. I would say there are the easy answers. But of
 course, the easy answer may not be the right answer. It may not be
 easy to get right answer [1].   As you can see from Alan Isaac's reply
 on this thread, even asking the question can be taken as being in bad
 faith.  In that situation, I think you'll find it hard to get sincere
 replies.


While I don't think jumping into NumPy C code is as difficult as some
people made it to be, I think numpy reaped most of the low-hanging fruits,
and is now at a stage where it requires massive investment to get
significantly better.

I would suggest a different question, whose answer may serve as a proxy to
uncover the lack of contributions: what needs to be done in NumPy, and how
can we make it simpler for newcommers ? Here is an incomplete,
unshamelessly biased list:

  - Less dependencies on CPython internals
  - Allow for 3rd parties to extend numpy at the C level in more
fundamental ways (e.g. I wished something like half-float dtype could be
more easily developed out of tree)
  - Separate memory representation from higher level representation
(slicing, broadcasting, etc…), to allow arrays to sit on non-contiguous
memory areas, etc…
  - Test and performance infrastructure so we can track our evolution, get
coverage of our C code, etc…
  - Fix bugs
  - Better integration with 3rd party on-disk storage (database, etc…)

None of that is particularly simple nor has a fast learning curve, except
for fixing bugs and maybe some of the infrastructure. I think most of this
is necessary for the things Travis talked about a few weeks ago.

What could make contributions easier:
  - different levels of C API documentation (still lacking anything besides
reference)
  - ways to detect early when we break ABI, slightly more obscure platforms
(we need good CI, ways to publish binaries that people can easily test,
etc...)
  - improve infrastructure so that we can focus on the things we want to
work on (improve the dire situation with bug tracking, etc…)

Also, lots of people just don't know/want to know C. But people with say
web skills would be welcome: we have a website that could use some help…

So
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Quaternion data type

2012-05-06 Thread David Cournapeau
On Sat, May 5, 2012 at 9:43 PM, Mark Wiebe mwwi...@gmail.com wrote:

 On Sat, May 5, 2012 at 1:06 PM, Charles R Harris 
 charlesr.har...@gmail.com wrote:

 On Sat, May 5, 2012 at 11:19 AM, Mark Wiebe mwwi...@gmail.com wrote:

 On Sat, May 5, 2012 at 11:55 AM, Charles R Harris 
 charlesr.har...@gmail.com wrote:

 On Sat, May 5, 2012 at 5:27 AM, Tom Aldcroft 
 aldcr...@head.cfa.harvard.edu wrote:

 On Fri, May 4, 2012 at 11:44 PM, Ilan Schnell ischn...@enthought.com
 wrote:
  Hi Chuck,
 
  thanks for the prompt reply.  I as curious because because
  someone was interested in adding
 http://pypi.python.org/pypi/Quaternion
  to EPD, but Martin and Mark's implementation of quaternions
  looks much better.

 Hi -

 I'm a co-author of the above mentioned Quaternion package.  I agree
 the numpy_quaternion version would be better, but if there is no
 expectation that it will move forward I can offer to improve our
 Quaternion.  A few months ago I played around with making it accept
 arbitrary array inputs (with similar shape of course) to essentially
 vectorize the transformations.  We never got around to putting this in
 a release because of a perceived lack of interest / priorities... If
 this would be useful then let me know.


 Would you be interested in carrying Martin's package forward? I'm not
 opposed to having quaternions in numpy/scipy but there needs to be someone
 to push it and deal with problems if they come up. Martin's package
 disappeared in large part because Martin disappeared. I'd also like to hear
 from Mark about other aspects, as there was also a simple rational user
 type proposed that we were looking to put in as an extension 'test' type.
 IIRC, there were some needed fixes to Numpy, some of which were postponed
 in favor of larger changes. User types is one of the things we want ot get
 fixed up.


 I kind of like the idea of there being a package, separate from numpy,
 which collects these dtypes together. To start, the quaternion and the
 rational type could go in it, and eventually I think it would be nice to
 move datetime64 there as well. Maybe it could be called numpy-dtypes, or
 would a more creative name be better?


 I'm trying to think about how that would be organized. We could create a
 new repository, numpy-user-types (numpy-extension-types), under the numpy
 umbrella. It would need documents and such as well as someone interested in
 maintaining it and making releases. A branch in the numpy repository
 wouldn't work since we would want to rebase it regularly. It could maybe go
 in scipy but a new package would need to be created there and it feels too
 distant from numpy for such basic types as datetime.

 Do you have thoughts about the details?


 Another repository under the numpy umbrella would best fit what I'm
 imagining, yes. I would imagine it as a package of additional types that
 aren't the core ones, but that many people would probably want to install.
 It would also be a way to continually exercise the type extension system,
 to make sure it doesn't break. It couldn't be a branch of numpy, rather a
 collection of additional dtypes and associated useful functions.


I would be in favor of this as well. We could start the repository by
having one trivial dtype that would serve as an example. That's something
I have been interested in, I can lock a couple of hours / week to help this
with.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

2012-05-14 Thread David Cournapeau
On Mon, May 14, 2012 at 5:31 PM, mark florisson
markflorisso...@gmail.comwrote:

 On 12 May 2012 22:55, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no
 wrote:
  On 05/11/2012 03:37 PM, mark florisson wrote:
 
  On 11 May 2012 12:13, Dag Sverre Seljebotnd.s.seljeb...@astro.uio.no
   wrote:
 
  (NumPy devs: I know, I get too many ideas. But this time I *really*
  believe
  in it, I think this is going to be *huge*. And if Mark F. likes it it's
  not
  going to be without manpower; and as his mentor I'd pitch in too here
 and
  there.)
 
  (Mark F.: I believe this is *very* relevant to your GSoC. I certainly
  don't
  want to micro-manage your GSoC, just have your take.)
 
  Travis, thank you very much for those good words in the NA-mask
  interactions... thread. It put most of my concerns away. If anybody is
  leaning towards for opaqueness because of its OOP purity, I want to
 refer
  to
  C++ and its walled-garden of ideological purity -- it has, what, 3-4
  different OOP array libraries, neither of which is able to out-compete
  the
  other. Meanwhile the rest of the world happily cooperates using
 pointers,
  strides, CSR and CSC.
 
  Now, there are limits to what you can do with strides and pointers.
  Noone's
  denying the need for more. In my mind that's an API where you can do
  fetch_block and put_block of cache-sized, N-dimensional blocks on an
  array;
  but it might be something slightly different.
 
  Here's what I'm asking: DO NOT simply keep extending ndarray and the
  NumPy C
  API to deal with this issue.
 
  What we need is duck-typing/polymorphism at the C level. If you keep
  extending ndarray and the NumPy C API, what we'll have is a one-to-many
  relationship: One provider of array technology, multiple consumers
 (with
  hooks, I'm sure, but all implementations of the hook concept in the
 NumPy
  world I've seen so far are a total disaster!).
 
  What I think we need instead is something like PEP 3118 for the
  abstract
  array that is only available block-wise with getters and setters. On
 the
  Cython list we've decided that what we want for CEP 1000 (for boxing
  callbacks etc.) is to extend PyTypeObject with our own fields; we could
  create CEP 1001 to solve this issue and make any Python object an
  exporter
  of block-getter/setter-arrays (better name needed).
 
  What would be exported is (of course) a simple vtable:
 
  typedef struct {
 int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t
 *lower_right,
  ...);
 ...
  } block_getter_setter_array_vtable;
 
  Let's please discuss the details *after* the fundamentals. But the
 reason
  I
  put void* there instead of PyObject* is that I hope this could be used
  beyond the Python world (say, Python-Julia); the void* would be
 handed
  to
  you at the time you receive the vtable (however we handle that).
 
 
  I suppose it would also be useful to have some way of predicting the
  output format polymorphically for the caller. E.g. dense *
  block_diagonal results in block diagonal, but dense + block_diagonal
  results in dense, etc. It might be useful for the caller to know
  whether it needs to allocate a sparse, dense or block-structured
  array. Or maybe the polymorphic function could even do the allocation.
  This needs to happen recursively of course, to avoid intermediate
  temporaries. The compiler could easily handle that, and so could numpy
  when it gets lazy evaluation.
 
 
  Ah. But that depends too on the computation to be performed too; a)
  elementwise, b) axis-wise reductions, c) linear algebra...
 
  In my oomatrix code (please don't look at it, it's shameful) I do this
 using
  multiple dispatch.
 
  I'd rather ignore this for as long as we can, only implementing a[:] =
 ...
  -- I can't see how decisions here would trickle down to the API that's
 used
  in the kernel, it's more like a pre-phase, and better treated
 orthogonally.
 
 
  I think if the heavy lifting of allocating output arrays and exporting
  these arrays work in numpy, then support in Cython could use that (I
  can already hear certain people object to more complicated array stuff
  in Cython :). Even better here would be an external project that each
  our projects could use (I still think the nditer sorting functionality
  of arrays should be numpy-agnostic and externally available).
 
 
  I agree with the separate project idea. It's trivial for NumPy to
  incorporate that as one of its methods for exporting arrays, and I don't
  think it makes sense to either build it into Cython, or outright depend
 on
  NumPy.
 
  Here's what I'd like (working title: NumBridge?).
 
   - Mission: Be the double* + shape + strides in a world where that is
 no
  longer enough, by providing tight, focused APIs/ABIs that are usable
 across
  C/Fortran/Python.
 
  I basically want something I can quickly acquire from a NumPy array, then
  pass it into my C code without dragging along all the cruft that I don't
  need.
 
   - Written in pure C + specs, 

Re: [Numpy-discussion] Masked Array for NumPy 1.7

2012-05-19 Thread David Cournapeau
On Sat, May 19, 2012 at 3:17 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant tra...@continuum.iowrote:

 Hey all,

 After reading all the discussion around masked arrays and getting input
 from as many people as possible, it is clear that there is still
 disagreement about what to do, but there have been some fruitful
 discussions that ensued.

 This isn't really new as there was significant disagreement about what to
 do when the masked array code was initially checked in to master.   So, in
 order to move forward, Mark and I are going to work together with whomever
 else is willing to help with an effort that is in the spirit of my third
 proposal but has a few adjustments.

 The idea will be fleshed out in more detail as it progresses, but the
 basic concept is to create an (experimental) ndmasked object in NumPy 1.7
 and leave the actual ndarray object unchanged.   While the details need to
 be worked out here,  a goal is to have the C-API work with both ndmasked
 arrays and arrayobjects (possibly by defining a base-class C-level
 structure that both ndarrays inherit from). This might also be a good
 way for Dag to experiment with his ideas as well but that is not an
 explicit goal.

 One way this could work, for example is to have PyArrayObject * be the
 base-class array (essentially the same C-structure we have now with a
 HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject *
 as well but add more members to the C-structure. I think this is the
 easiest thing to do and requires the least amount of code-change.  It
 is also possible to define an abstract base-class PyArrayObject * that both
 ndarray and ndmasked inherit from. That way ndarray and ndmasked are
 siblings even though the ndarray would essentially *be* the PyArrayObject *
 --- just with a different type-hierarchy on the python side.

 This work will take some time and, therefore, I don't expect 1.7 to be
 released prior to SciPy Austin with an end of June target date.   The
 timing will largely depend on what time is available from people interested
 in resolving the situation.   Mark and I will have some availability for
 this work in June but not a great deal (about 2 man-weeks total between
 us).If there are others who can step in and help, it will help
 accelerate the process.


 This will be a difficult thing for others to help with since the concept
 is vague, the design decisions seem to be in your and Mark's hands, and you
 say you don't have much time. It looks to me like 1.7 will keep slipping
 and I don't think that is a good thing. Why not go for option 2, which will
 get 1.7 out there and push the new masked array work in to 1.8? Breaking
 the flow of development and release has consequences, few of them good.


Agreed. 1.6.0 was released one year ago already, let's focus on polishing
what's in there *now*. I have not followed closely what the decision was
for a LTS release, but if 1.7 is supposed to be it, that's another argument
about changing anything there for 1.7.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] Ubuntu PPA for NumPy / SciPy / ...

2012-06-08 Thread David Cournapeau
On Thu, Jun 7, 2012 at 5:24 PM, Andreas Hilboll li...@hilboll.de wrote:

 Hi,

 I just noticed that there's a PPA for NumPy/SciPy on Launchpad:

   https://launchpad.net/~scipy/+archive/ppa

 However, it's painfully outdated. Does anyone know of its status? Is it
 'official'? Are there any plans in revitalizing it, possibly with adding
 other projects from the scipy universe? Is there help needed?

 Many questions, but possibly quite easy to answer ...


I set up this PPA a long time ago. I just don't have time to maintain it at
that point, but would be happy to give someone the keys to make it up to
date.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Neighborhood iterator: way to easily check which elements have already been visited in parent iterator?

2012-06-13 Thread David Cournapeau
Not the neighborhood one, though. It would be good if this iterator had a
cython wrapper, and ndimage used that, though.
Le 13 juin 2012 18:59, Ralf Gommers ralf.gomm...@googlemail.com a
écrit :



 On Wed, Jun 13, 2012 at 6:57 PM, Thouis (Ray) Jones tho...@gmail.comwrote:

 Hello,

 I'm rewriting scipy.ndimage.label() using numpy's iterator API,


 I think there were some changes to the iterator API recently, so please
 keep in mind that scipy has to still be compatible with numpy 1.5.1 (at
 least for now).

 Ralf


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [ANN] Bento 0.1.0

2012-06-14 Thread David Cournapeau
Hi,

I am pleased to announce a new release of bento, a packaging solution
for python which aims at reproducibility, extensibility and simplicity.

The main features of this 0.1.0 release are:

- new commands register_pypi and upload_pypi to register a package
to
  pypi and upload tarballs to it.
- add sphinx command to build a package documentation if it uses
  sphinx.
- add tweak_library/tweak_extension functions to build contexts to
  simplify simple builder customization (e.g. include_dirs, defines,
  etc...)
- waf backend: cython tool automatically loaded if cython files are
  detected in sources
- UseBackends feature: allows to declare which build backend to use
  when building C extensions in the bento.info file directly
- add --use-distutils-flags configure option to force using flags
from
  distutils (disabled by default).
- add --disable-autoconfigure build option to bypass configure for
fast
  partial rebuilds. This is not reliable depending on how the
  environment is changed, so one should only use this during
  development.
- add register_metadata API to register new metadata to be filled in
  MetaTemplateFile

Bento source code can be found on github: https://github.com/cournape/Bento
Bento documentation is there as well: https://cournape.github.com/Bento

regards,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pull request: Split maskna support out of mainline into a branch

2012-06-14 Thread David Cournapeau
On Thu, Jun 14, 2012 at 5:17 PM, Nathaniel Smith n...@pobox.com wrote:

 On Wed, Jun 6, 2012 at 11:08 PM, Nathaniel Smith n...@pobox.com wrote:
  Just submitted this pull request for discussion:
   https://github.com/numpy/numpy/pull/297
 
  As per earlier discussion on the list, this PR attempts to remove
  exactly and only the maskna-related code from numpy mainline:
   http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html
 
  The suggestion is that we merge this to master for the 1.7 release,
  and immediately git revert it on a branch so that it can be modified
  further without blocking the release.
 
  The first patch does the actual maskna removal; the second and third
  rearrange things so that PyArray_ReduceWrapper does not end up in the
  public API, for reasons described therein.
 
  All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit
  Ubuntu. The docs also appear to build. Before I re-based this I also
  tested against Scipy, matplotlib, and pandas, and all were fine.

 While it's tempting to think that the lack of response to this
 email/PR indicates that everyone now agrees with me about how to
 proceed with the NA work, I'm for some reason unconvinced...

 Any objections to merging this?


No objection, but could you wait for this WE ? I am in the middle of
setting up a buildbot for windows for numpy (for both mingw and MSVC
compilers), and that would be a good way to test it.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Pull request: Split maskna support out of mainline into a branch

2012-06-17 Thread David Cournapeau
On Sat, Jun 16, 2012 at 9:39 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Jun 14, 2012 at 5:20 PM, David Cournapeau courn...@gmail.com
 wrote:
 
 
  On Thu, Jun 14, 2012 at 5:17 PM, Nathaniel Smith n...@pobox.com wrote:
 
  On Wed, Jun 6, 2012 at 11:08 PM, Nathaniel Smith n...@pobox.com wrote:
   Just submitted this pull request for discussion:
https://github.com/numpy/numpy/pull/297
  
   As per earlier discussion on the list, this PR attempts to remove
   exactly and only the maskna-related code from numpy mainline:
  
 http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html
  
   The suggestion is that we merge this to master for the 1.7 release,
   and immediately git revert it on a branch so that it can be modified
   further without blocking the release.
  
   The first patch does the actual maskna removal; the second and third
   rearrange things so that PyArray_ReduceWrapper does not end up in the
   public API, for reasons described therein.
  
   All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit
   Ubuntu. The docs also appear to build. Before I re-based this I also
   tested against Scipy, matplotlib, and pandas, and all were fine.
 
  While it's tempting to think that the lack of response to this
  email/PR indicates that everyone now agrees with me about how to
  proceed with the NA work, I'm for some reason unconvinced...
 
  Any objections to merging this?
 
 
  No objection, but could you wait for this WE ? I am in the middle of
 setting
  up a buildbot for windows for numpy (for both mingw and MSVC compilers),
 and
  that would be a good way to test it.

 Sounds like we have consensus and the patch is good to go, so let me
 know when you're ready...


Setting up the windows builbot is even more of a pain than I expected :(

In the end, I just tested your branch with MSVC for python 2.7 (32 bits),
and got the following errors related to NA:

==
ERROR: test_numeric.TestIsclose.test_masked_arrays
--
Traceback (most recent call last):
  File C:\Python27\lib\site-packages\nose-1.1.2-py2.7.egg\nose\case.py,
line 197, in runTest
self.test(*self.arg)
  File C:\Users\david\tmp\numpy-git\numpy\core\tests\test_numeric.py,
line 1274, in test_masked_arrays
assert_(type(x) == type(isclose(inf, x)))
  File C:\Users\david\tmp\numpy-git\numpy\core\numeric.py, line 2073, in
isclose
cond[~finite] = (x[~finite] == y[~finite])
  File C:\Users\david\tmp\numpy-git\numpy\ma\core.py, line 3579, in __eq__
check = ndarray.__eq__(self.filled(0), odata).view(type(self))
AttributeError: 'NotImplementedType' object has no attribute 'view'

==
ERROR: Test a special case for var
--
Traceback (most recent call last):
  File C:\Users\david\tmp\numpy-git\numpy\ma\tests\test_core.py, line
2735, in test_varstd_specialcases
_ = method(out=nout)
  File C:\Users\david\tmp\numpy-git\numpy\ma\core.py, line 4778, in std
dvar = sqrt(dvar)
  File C:\Users\david\tmp\numpy-git\numpy\ma\core.py, line 849, in
__call__
m |= self.domain(d)
  File C:\Users\david\tmp\numpy-git\numpy\ma\core.py, line 801, in
__call__
return umath.less(x, self.critical_value)
RuntimeWarning: invalid value encountered in less

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Created NumPy 1.7.x branch

2012-06-25 Thread David Cournapeau
On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote:


 My understanding is that Travis is simply trying to stress We have to
 think about the implications of our changes on existing users. and
 also that little changes (with the best intentions!) that however mean
 either a breakage or confusion for users (due to historical reasons)
 should be avoided if possible. And I very strongly feel the same way.
 And I think that most people on this list do as well.

I think Travis is more concerned about API than ABI changes (in that
example for 1.4, the ABI breakage was caused by a change that was
pushed by Travis IIRC).

The relative importance of API vs ABI is a tough one: I think ABI
breakage is as bad as API breakage (but matter in different
circumstances), but it is hard to improve the situation around our ABI
without changing the API (especially everything around macros and
publicly accessible structures). Changing this is politically
difficult because nobody will upgrade to a new numpy with a different
API just because it is cleaner, but without a cleaner API, it will be
difficult to implement quite a few improvements. The situation is not
that different form python 3, which has seen a poor adoption, and only
starts having interesting feature on its own now.

As for more concrete actions: I believe Wes McKinney has a
comprehensive suite with multiple versions of numpy/pandas, I can't
seem to find where that was mentioned, though. This would be a good
starting point to check ABI matters (say pandas, mpl, scipy on top of
multiple numpy).

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Created NumPy 1.7.x branch

2012-06-25 Thread David Cournapeau
On Tue, Jun 26, 2012 at 4:42 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote:
 On Mon, Jun 25, 2012 at 8:35 PM, David Cournapeau courn...@gmail.com wrote:
 On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík ondrej.cer...@gmail.com 
 wrote:


 My understanding is that Travis is simply trying to stress We have to
 think about the implications of our changes on existing users. and
 also that little changes (with the best intentions!) that however mean
 either a breakage or confusion for users (due to historical reasons)
 should be avoided if possible. And I very strongly feel the same way.
 And I think that most people on this list do as well.

 I think Travis is more concerned about API than ABI changes (in that
 example for 1.4, the ABI breakage was caused by a change that was
 pushed by Travis IIRC).

 The relative importance of API vs ABI is a tough one: I think ABI
 breakage is as bad as API breakage (but matter in different
 circumstances), but it is hard to improve the situation around our ABI
 without changing the API (especially everything around macros and
 publicly accessible structures). Changing this is politically
 difficult because nobody will upgrade to a new numpy with a different
 API just because it is cleaner, but without a cleaner API, it will be
 difficult to implement quite a few improvements. The situation is not
 that different form python 3, which has seen a poor adoption, and only
 starts having interesting feature on its own now.

 As for more concrete actions: I believe Wes McKinney has a
 comprehensive suite with multiple versions of numpy/pandas, I can't
 seem to find where that was mentioned, though. This would be a good
 starting point to check ABI matters (say pandas, mpl, scipy on top of
 multiple numpy).

 I will try to check as many packages as I can to see what actual problems
 arise. I have created an issue for it:

 https://github.com/numpy/numpy/issues/319

 Feel free to add more packages that you feel are important. I will try to 
 check
 at least the ones that are in the issue, and more if I have time. I will
 close the issue once the upgrade path is clearly documented in the release
 for every thing that breaks.

I believe the basis can be 1.4.1 against which we build different
packages, and then test each new version.

There are also tools to check ABI compatibility (e.g.
http://ispras.linuxbase.org/index.php/ABI_compliance_checker), but I
have never used them. Being able to tell when a version of numpy
breaks ABI would already be a good improvement.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Created NumPy 1.7.x branch

2012-06-25 Thread David Cournapeau
On Tue, Jun 26, 2012 at 5:17 AM, Travis Oliphant tra...@continuum.io wrote:

 On Jun 25, 2012, at 10:35 PM, David Cournapeau wrote:

 On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík ondrej.cer...@gmail.com 
 wrote:


 My understanding is that Travis is simply trying to stress We have to
 think about the implications of our changes on existing users. and
 also that little changes (with the best intentions!) that however mean
 either a breakage or confusion for users (due to historical reasons)
 should be avoided if possible. And I very strongly feel the same way.
 And I think that most people on this list do as well.

 I think Travis is more concerned about API than ABI changes (in that
 example for 1.4, the ABI breakage was caused by a change that was
 pushed by Travis IIRC).

 In the present climate, I'm going to have to provide additional context to a 
 comment like this.  This is not an accurate enough characterization of 
 events.   I was trying to get date-time changes in, for sure.   I generally 
 like feature additions to NumPy.   (Robert Kern was also involved with that 
 effort and it was funded by an active user of NumPy.    I was concerned that 
 the changes would break the ABI.

I did not mean to go back at old history, sorry. My main point was to
highlight ABI vs API issues. Numpy needs to decide whether it attempts
to keep ABI or not. We already had this discussion 2 years ago (for
the issue mentioned by Ondrej), and the decision was not made. The
arguments and their value did not really change. The issue is thus
that a decision needs to be made over that disagreement in one way or
the other.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Created NumPy 1.7.x branch

2012-06-26 Thread David Cournapeau
On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
 On 06/26/2012 05:35 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com  
 wrote:


 My understanding is that Travis is simply trying to stress We have to
 think about the implications of our changes on existing users. and
 also that little changes (with the best intentions!) that however mean
 either a breakage or confusion for users (due to historical reasons)
 should be avoided if possible. And I very strongly feel the same way.
 And I think that most people on this list do as well.

 I think Travis is more concerned about API than ABI changes (in that
 example for 1.4, the ABI breakage was caused by a change that was
 pushed by Travis IIRC).

 The relative importance of API vs ABI is a tough one: I think ABI
 breakage is as bad as API breakage (but matter in different
 circumstances), but it is hard to improve the situation around our ABI
 without changing the API (especially everything around macros and
 publicly accessible structures). Changing this is politically

 But I think it is *possible* to get to a situation where ABI isn't
 broken without changing API. I have posted such a proposal.
 If one uses the kind of C-level duck typing I describe in the link
 below, one would do

 typedef PyObject PyArrayObject;

 typedef struct {
    ...
 } NumPyArray; /* used to be PyArrayObject */

Maybe we're just in violent agreement, but whatever ends up being used
would require to change the *current* C API, right ? If one wants to
allow for changes in our structures more freely, we have to hide them
from the headers, which means breaking the code that depends on the
structure binary layout. Any code that access those directly will need
to be changed.

There is the particular issue of iterator, which seem quite difficult
to make ABI-safe without losing significant performance.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

2012-06-26 Thread David Cournapeau
Hi,

I am just continuing the discussion around ABI/API, the technical side
of things that is, as this is unrelated to 1.7.x. release.

On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
 On 06/26/2012 11:58 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no  wrote:
 On 06/26/2012 05:35 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com    
 wrote:


 My understanding is that Travis is simply trying to stress We have to
 think about the implications of our changes on existing users. and
 also that little changes (with the best intentions!) that however mean
 either a breakage or confusion for users (due to historical reasons)
 should be avoided if possible. And I very strongly feel the same way.
 And I think that most people on this list do as well.

 I think Travis is more concerned about API than ABI changes (in that
 example for 1.4, the ABI breakage was caused by a change that was
 pushed by Travis IIRC).

 The relative importance of API vs ABI is a tough one: I think ABI
 breakage is as bad as API breakage (but matter in different
 circumstances), but it is hard to improve the situation around our ABI
 without changing the API (especially everything around macros and
 publicly accessible structures). Changing this is politically

 But I think it is *possible* to get to a situation where ABI isn't
 broken without changing API. I have posted such a proposal.
 If one uses the kind of C-level duck typing I describe in the link
 below, one would do

 typedef PyObject PyArrayObject;

 typedef struct {
     ...
 } NumPyArray; /* used to be PyArrayObject */

 Maybe we're just in violent agreement, but whatever ends up being used
 would require to change the *current* C API, right ? If one wants to

 Accessing arr-dims[i] directly would need to change. But that's been
 discouraged for a long time. By API I meant access through the macros.

 One of the changes under discussion here is to change PyArray_SHAPE from
 a macro that accepts both PyObject* and PyArrayObject* to a function
 that only accepts PyArrayObject* (hence breakage). I'm saying that under
 my proposal, assuming I or somebody else can find the time to implement
 it under, you can both make it a function and have it accept both
 PyObject* and PyArrayObject* (since they are the same), undoing the
 breakage but allowing to hide the ABI.

 (It doesn't give you full flexibility in ABI, it does require that you
 somewhere have an npy_intp dims[nd] with the same lifetime as your
 object, etc., but I don't consider that a big disadvantage).

 allow for changes in our structures more freely, we have to hide them
 from the headers, which means breaking the code that depends on the
 structure binary layout. Any code that access those directly will need
 to be changed.

 There is the particular issue of iterator, which seem quite difficult
 to make ABI-safe without losing significant performance.

 I don't agree (for some meanings of ABI-safe). You can export the data
 (dataptr/shape/strides) through the ABI, then the iterator uses these in
 whatever way it wishes consumer-side. Sort of like PEP 3118 without the
 performance degradation. The only sane way IMO of doing iteration is
 building it into the consumer anyway.

(I have not read the whole cython discussion yet)

What do you mean by building iteration in the consumer ? My
understanding is that any data export would be done through a level of
indirection (dataptr/shape/strides). Conceptually, I can't see how one
could keep ABI without that level of indirection without some compile.
In the case of iterator, that means multiple pointer chasing per
sample -- i.e. the tight loop issue you mentioned earlier for
PyArray_DATA is the common case for iterator.

I can only see two ways of doing fast (special casing) iteration:
compile-time special casing or runtime optimization. Compile-time
requires access to the internals (even if one were to use C++ with
advanced template magic ala STL/iterator, I don't think one can get
performance if everything is not in the headers, but maybe C++
compilers are super smart those days in ways I can't comprehend). I
would think runtime is the long-term solution, but that's far away,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

2012-06-26 Thread David Cournapeau
On Tue, Jun 26, 2012 at 2:40 PM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
 On 06/26/2012 01:48 PM, David Cournapeau wrote:
 Hi,

 I am just continuing the discussion around ABI/API, the technical side
 of things that is, as this is unrelated to 1.7.x. release.

 On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no  wrote:
 On 06/26/2012 11:58 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no    wrote:
 On 06/26/2012 05:35 AM, David Cournapeau wrote:
 On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com  
     wrote:


 My understanding is that Travis is simply trying to stress We have to
 think about the implications of our changes on existing users. and
 also that little changes (with the best intentions!) that however mean
 either a breakage or confusion for users (due to historical reasons)
 should be avoided if possible. And I very strongly feel the same way.
 And I think that most people on this list do as well.

 I think Travis is more concerned about API than ABI changes (in that
 example for 1.4, the ABI breakage was caused by a change that was
 pushed by Travis IIRC).

 The relative importance of API vs ABI is a tough one: I think ABI
 breakage is as bad as API breakage (but matter in different
 circumstances), but it is hard to improve the situation around our ABI
 without changing the API (especially everything around macros and
 publicly accessible structures). Changing this is politically

 But I think it is *possible* to get to a situation where ABI isn't
 broken without changing API. I have posted such a proposal.
 If one uses the kind of C-level duck typing I describe in the link
 below, one would do

 typedef PyObject PyArrayObject;

 typedef struct {
      ...
 } NumPyArray; /* used to be PyArrayObject */

 Maybe we're just in violent agreement, but whatever ends up being used
 would require to change the *current* C API, right ? If one wants to

 Accessing arr-dims[i] directly would need to change. But that's been
 discouraged for a long time. By API I meant access through the macros.

 One of the changes under discussion here is to change PyArray_SHAPE from
 a macro that accepts both PyObject* and PyArrayObject* to a function
 that only accepts PyArrayObject* (hence breakage). I'm saying that under
 my proposal, assuming I or somebody else can find the time to implement
 it under, you can both make it a function and have it accept both
 PyObject* and PyArrayObject* (since they are the same), undoing the
 breakage but allowing to hide the ABI.

 (It doesn't give you full flexibility in ABI, it does require that you
 somewhere have an npy_intp dims[nd] with the same lifetime as your
 object, etc., but I don't consider that a big disadvantage).

 allow for changes in our structures more freely, we have to hide them
 from the headers, which means breaking the code that depends on the
 structure binary layout. Any code that access those directly will need
 to be changed.

 There is the particular issue of iterator, which seem quite difficult
 to make ABI-safe without losing significant performance.

 I don't agree (for some meanings of ABI-safe). You can export the data
 (dataptr/shape/strides) through the ABI, then the iterator uses these in
 whatever way it wishes consumer-side. Sort of like PEP 3118 without the
 performance degradation. The only sane way IMO of doing iteration is
 building it into the consumer anyway.

 (I have not read the whole cython discussion yet)

 I'll try to write a summary and post it when I can get around to it.


 What do you mean by building iteration in the consumer ? My

 consumer is the user of the NumPy C API. So I meant that the iteration
 logic is all in C header files and compiled again for each such
 consumer. Iterators don't cross the ABI boundary.

 understanding is that any data export would be done through a level of
 indirection (dataptr/shape/strides). Conceptually, I can't see how one
 could keep ABI without that level of indirection without some compile.
 In the case of iterator, that means multiple pointer chasing per
 sample -- i.e. the tight loop issue you mentioned earlier for
 PyArray_DATA is the common case for iterator.

 Even if you do indirection, iterator utilities that are compiled in the
 consumer/user code can cache the data that's retrieved.

 Iterators just do

 // setup crossing ABI
 npy_intp *shape = PyArray_DIMS(arr);
 npy_intp *strides = PyArray_STRIDES(arr);
 ...
 // performance-sensitive code just accesses cached pointers and don't
 // cross ABI

The problem is that iterators need more that this. But thinking more
about it, I am not so dead sure we could not get there. I will need to
play with some code.


 Going slightly OT, then IMO, the *only* long-term solution in 2012 is
 LLVM. That allows you to do any level of inlining and special casing and
 optimization at run-time, which is the only way

Re: [Numpy-discussion] Created NumPy 1.7.x branch

2012-06-26 Thread David Cournapeau
On Tue, Jun 26, 2012 at 5:24 PM, Travis Oliphant tra...@continuum.io wrote:

 Let us note that that problem was due to Travis convincing David to
 include the Datetime work in the release against David's own best judgement.
 The result was a delay of several months until Ralf could get up to speed
 and get 1.4.1 out. Let us also note that poly1d is actually not the same as
 Matlab poly1d.


 This is not accurate, Charles.  Please stop trying to dredge up old
 history you don't know the full story about and are trying to create an
 alternate reality about.   It doesn't help anything and is quite poisonous
 to this mailing list.


 I didn't start the discussion of 1.4, nor did I raise the issue at the time
 as I didn't think it would be productive. We moved forward. But in any case,
 I asked David at the time why the datetime stuff got included. I'd welcome
 your version if you care to offer it. That would be more useful than
 accusing me of creating an alternative reality and would clear the air.


 The datetime stuff got included because it is a very useful and important
 feature for multiple users.   It still needed work, but it was in a state
 where it could be tried.   It did require breaking ABI compatibility in the
 state it was in.   My approach was to break ABI compatibility and move
 forward (there were other things we could do at the time that are still
 needed in the code base that will break ABI compatibility in the future).
  David didn't want to break ABI compatibility and so tried to satisfy two
 competing desires in a way that did not ultimately work.     These things
 happen.    We all get to share responsibility for the outcome.

I think Chuck alludes to the fact that I was rather reserved about
merging datetime before *anyone* knew about breaking the ABI. I don't
feel responsible for this issue (except I maybe should have pushed
more strongly about datetime being included), but I am also not
interested in making a big deal out of it, certainly not two years
after the fact. I am merely point this out so that you realize that
you may both have a different view that could be seen as valid
depending on what you are willing to highlight.

I suggest that Chuck and you take this off-list,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combined versus separate build

2012-06-27 Thread David Cournapeau
On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smith n...@pobox.com wrote:
 Currently the numpy build system(s) support two ways of building
 numpy: either by compiling a giant concatenated C file, or by the more
 conventional route of first compiling each .c file to a .o file, and
 then linking those together. I gather from comments in the source code
 that the former is the traditional method, and the latter is the newer
 experimental approach.

 It's easy to break one of these builds without breaking the other (I
 just did this with the NA branch, and David had to clean up after me),
 and I don't see what value we really get from having both options --
 it seems to just double the size of the test matrix without adding
 value.

There is unfortunately a big value in it: there is no standard way in
C to share symbols within a library without polluting the whole
process namespace, except on windows where the default is to export
nothing.

Most compilers support it (I actually know of none that does not
support it in some way or the others), but that's platform-specific.

I do find the multi-file support useful when developing (it does not
make the full build faster, but I find partial rebuild too slow
without it).

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combined versus separate build

2012-06-27 Thread David Cournapeau
On Wed, Jun 27, 2012 at 8:07 PM, Nathaniel Smith n...@pobox.com wrote:
 On Wed, Jun 27, 2012 at 7:50 PM, David Cournapeau courn...@gmail.com wrote:
 On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smith n...@pobox.com wrote:
 Currently the numpy build system(s) support two ways of building
 numpy: either by compiling a giant concatenated C file, or by the more
 conventional route of first compiling each .c file to a .o file, and
 then linking those together. I gather from comments in the source code
 that the former is the traditional method, and the latter is the newer
 experimental approach.

 It's easy to break one of these builds without breaking the other (I
 just did this with the NA branch, and David had to clean up after me),
 and I don't see what value we really get from having both options --
 it seems to just double the size of the test matrix without adding
 value.

 There is unfortunately a big value in it: there is no standard way in
 C to share symbols within a library without polluting the whole
 process namespace, except on windows where the default is to export
 nothing.

 Most compilers support it (I actually know of none that does not
 support it in some way or the others), but that's platform-specific.

 IIRC this isn't too tricky to arrange for with gcc

No, which is why this is supported for gcc and windows :)

, but why is this an
 issue in the first place for a Python extension module? Extension
 modules are opened without RTLD_GLOBAL, which means that they *never*
 export any symbols. At least, that's how it should work on Linux and
 most Unix-alikes; I don't know much about OS X's linker, except that
 it's unusual in other ways.

The pragmatic answer is that if it were not an issue, python itself
would not bother with it. Every single extension module in python
itself is built from a single compilation unit. This is also why we
have this awful system to export the numpy C API with array of
function pointers instead of simply exporting things in a standard
way.

See this: http://docs.python.org/release/2.5.3/ext/using-cobjects.html

Looking quickly at the 2.7.3 sources, the more detailed answer is that
python actually does not use RTLD_LOCAL (nor RTLD_GLOBAL), and what
happens when neither of them is used is implementation-dependent. It
seems to be RTLD_LOCAL on linux, and RTLD_GLOBAL on mac os x. There
also may be consequences on the use of RTLD_LOCAL in embedded mode (I
have ancient and bad memories with matlab related to this, but I
forgot the details).

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combined versus separate build

2012-06-27 Thread David Cournapeau
On Wed, Jun 27, 2012 at 8:53 PM, Nathaniel Smith n...@pobox.com wrote:
 On Wed, Jun 27, 2012 at 8:29 PM, David Cournapeau courn...@gmail.com wrote:
 On Wed, Jun 27, 2012 at 8:07 PM, Nathaniel Smith n...@pobox.com wrote:
 On Wed, Jun 27, 2012 at 7:50 PM, David Cournapeau courn...@gmail.com 
 wrote:
 On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smith n...@pobox.com wrote:
 Currently the numpy build system(s) support two ways of building
 numpy: either by compiling a giant concatenated C file, or by the more
 conventional route of first compiling each .c file to a .o file, and
 then linking those together. I gather from comments in the source code
 that the former is the traditional method, and the latter is the newer
 experimental approach.

 It's easy to break one of these builds without breaking the other (I
 just did this with the NA branch, and David had to clean up after me),
 and I don't see what value we really get from having both options --
 it seems to just double the size of the test matrix without adding
 value.

 There is unfortunately a big value in it: there is no standard way in
 C to share symbols within a library without polluting the whole
 process namespace, except on windows where the default is to export
 nothing.

 Most compilers support it (I actually know of none that does not
 support it in some way or the others), but that's platform-specific.

 IIRC this isn't too tricky to arrange for with gcc

 No, which is why this is supported for gcc and windows :)

, but why is this an
 issue in the first place for a Python extension module? Extension
 modules are opened without RTLD_GLOBAL, which means that they *never*
 export any symbols. At least, that's how it should work on Linux and
 most Unix-alikes; I don't know much about OS X's linker, except that
 it's unusual in other ways.

 The pragmatic answer is that if it were not an issue, python itself
 would not bother with it. Every single extension module in python
 itself is built from a single compilation unit. This is also why we
 have this awful system to export the numpy C API with array of
 function pointers instead of simply exporting things in a standard
 way.

 The array-of-function-pointers is solving the opposite problem, of
 exporting functions *without* having global symbols.

I meant that the lack of standard around symbols and namespaces is why
we have to do those hacks. Most platforms have much better solutions
to those problems.


 See this: http://docs.python.org/release/2.5.3/ext/using-cobjects.html

 Looking quickly at the 2.7.3 sources, the more detailed answer is that
 python actually does not use RTLD_LOCAL (nor RTLD_GLOBAL), and what
 happens when neither of them is used is implementation-dependent. It
 seems to be RTLD_LOCAL on linux, and RTLD_GLOBAL on mac os x. There
 also may be consequences on the use of RTLD_LOCAL in embedded mode (I
 have ancient and bad memories with matlab related to this, but I
 forgot the details).

 See, I knew OS X was quirky :-). That's what I get for trusting dlopen(3).

 But seriously, what compilers do we support that don't have
 -fvisibility=hidden? ...Is there even a list of compilers we support
 available anywhere?

Well, I am not sure how all this is handled on the big guys (bluegen
and co), for once.

There is also the issue of the consequence on statically linking numpy
to python: I don't what they are (I would actually like to make
statically linked numpy into python easier, not harder).

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combined versus separate build

2012-06-27 Thread David Cournapeau
On Wed, Jun 27, 2012 at 8:57 PM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
 On 06/27/2012 09:53 PM, Nathaniel Smith wrote:
 On Wed, Jun 27, 2012 at 8:29 PM, David Cournapeaucourn...@gmail.com  wrote:
 On Wed, Jun 27, 2012 at 8:07 PM, Nathaniel Smithn...@pobox.com  wrote:
 On Wed, Jun 27, 2012 at 7:50 PM, David Cournapeaucourn...@gmail.com  
 wrote:
 On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smithn...@pobox.com  wrote:
 Currently the numpy build system(s) support two ways of building
 numpy: either by compiling a giant concatenated C file, or by the more
 conventional route of first compiling each .c file to a .o file, and
 then linking those together. I gather from comments in the source code
 that the former is the traditional method, and the latter is the newer
 experimental approach.

 It's easy to break one of these builds without breaking the other (I
 just did this with the NA branch, and David had to clean up after me),
 and I don't see what value we really get from having both options --
 it seems to just double the size of the test matrix without adding
 value.

 There is unfortunately a big value in it: there is no standard way in
 C to share symbols within a library without polluting the whole
 process namespace, except on windows where the default is to export
 nothing.

 Most compilers support it (I actually know of none that does not
 support it in some way or the others), but that's platform-specific.

 IIRC this isn't too tricky to arrange for with gcc

 No, which is why this is supported for gcc and windows :)

 , but why is this an
 issue in the first place for a Python extension module? Extension
 modules are opened without RTLD_GLOBAL, which means that they *never*
 export any symbols. At least, that's how it should work on Linux and
 most Unix-alikes; I don't know much about OS X's linker, except that
 it's unusual in other ways.

 The pragmatic answer is that if it were not an issue, python itself
 would not bother with it. Every single extension module in python
 itself is built from a single compilation unit. This is also why we
 have this awful system to export the numpy C API with array of
 function pointers instead of simply exporting things in a standard
 way.

 The array-of-function-pointers is solving the opposite problem, of
 exporting functions *without* having global symbols.

 See this: http://docs.python.org/release/2.5.3/ext/using-cobjects.html

 Looking quickly at the 2.7.3 sources, the more detailed answer is that
 python actually does not use RTLD_LOCAL (nor RTLD_GLOBAL), and what
 happens when neither of them is used is implementation-dependent. It
 seems to be RTLD_LOCAL on linux, and RTLD_GLOBAL on mac os x. There
 also may be consequences on the use of RTLD_LOCAL in embedded mode (I
 have ancient and bad memories with matlab related to this, but I
 forgot the details).

 See, I knew OS X was quirky :-). That's what I get for trusting dlopen(3).

 But seriously, what compilers do we support that don't have
 -fvisibility=hidden? ...Is there even a list of compilers we support
 available anywhere?

 You could at the very least switch the default for a couple of releases,
 introducing a new flag with a please email numpy-discussion if you use
 this note, and see if anybody complains?

Yes, we could. That's actually why I set up travis-CI to build both
configurations in the first place :) (see
https://github.com/numpy/numpy/issues/315)

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Dropping support for Python 2.4 in NumPy 1.8

2012-06-28 Thread David Cournapeau
Hi Travis,

On Thu, Jun 28, 2012 at 1:25 PM, Travis Oliphant tra...@continuum.io wrote:
 Hey all,

 I'd like to propose dropping support for Python 2.4 in NumPy 1.8 (not the 1.7 
 release).      What does everyone think of that?

I think it would depend on 1.7 state. I am unwilling to drop support
for 2.4 in 1.8 unless we make 1.7 a LTS, that would be supported up to
2014 Q1 (when RHEL5 stops getting security fixes - RHEL 5 is the one
platform that warrants supporting 2.4 IMO)

In my mind, it means 1.7 needs to be stable. Ondrej (and others) work
to make sure we break neither API or ABI since a few releases would
help achieving that.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Non-deterministic test failure in master

2012-06-28 Thread David Cournapeau
On Thu, Jun 28, 2012 at 8:06 PM, Nathaniel Smith n...@pobox.com wrote:
 On Thu, Jun 28, 2012 at 7:13 AM, Pierre Haessig
 pierre.haes...@crans.org wrote:
 Hi Nathaniel,
 Le 27/06/2012 20:22, Nathaniel Smith a écrit :
 According to the Travis-CI build logs, this code produces
 non-deterministic behaviour in master:
 You mean non-deterministic across different builds, not across different
 executions on the same build, right ?

 I just ran a small loop :

 N = 1
 N_good = 0
 for i in range(N):
    a = np.arange(5)
    a[:3] = a[2:]
    if (a == [2,3,4,3,4]).all():
        N_good += 1
 print 'good result : %d/%d' % (N_good, N)

 and got 100 % good replication.

 Yes, the current hypothesis is that there is one particular Travis-CI
 machine on which memcpy goes backwards, and so the test fails whenever
 the build gets assigned to that machine. (Apparently this is actually
 faster on some CPUs, and new versions of glibc are known to exploit
 this.)

see also this: https://bugzilla.redhat.com/show_bug.cgi?id=638477

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Strange problem

2012-06-29 Thread David Cournapeau
On Fri, Jun 29, 2012 at 9:54 AM, Uwe Schmitt uschm...@mineway.de wrote:
 Hi,

 I have unreproducable crashes on a customers Win 7 machine with Python 2.7.2
 and
 Numpy 1.6.1.  He gets the following message:

   Problem signature:
   Problem Event Name: APPCRASH
   Application Name: python.exe
   Application Version: 0.0.0.0
   Application Timestamp: 4df4ba7c
   Fault Module Name: umath.pyd
   Fault Module Version: 0.0.0.0
   Fault Module Timestamp: 4e272b96
   Exception Code: c005
   Exception Offset: 0001983a
   OS Version: 6.1.7601.2.1.0.256.4
   Locale ID: 2055
   Additional Information 1: 0a9e
   Additional Information 2: 0a9e372d3b4ad19135b953a78882e789
   Additional Information 3: 0a9e
   Additional Information 4: 0a9e372d3b4ad19135b953a78882e789

 I know that I can not expect a clear answer without more information, but my
 customer is on hollidays and I just wanted to ask for some hints for
 possible
 reasons. The machine is not out of memory and despite this crash runs very
 stable.

Is this on 32 or 64 bits windows ? Do you know if your customer uses
only numpy, or other packages that depend on numpy C extension ?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combined versus separate build

2012-07-01 Thread David Cournapeau
On Sun, Jul 1, 2012 at 6:36 PM, Nathaniel Smith n...@pobox.com wrote:
 On Wed, Jun 27, 2012 at 9:05 PM, David Cournapeau courn...@gmail.com wrote:
 On Wed, Jun 27, 2012 at 8:53 PM, Nathaniel Smith n...@pobox.com wrote:
 But seriously, what compilers do we support that don't have
 -fvisibility=hidden? ...Is there even a list of compilers we support
 available anywhere?

 Well, I am not sure how all this is handled on the big guys (bluegen
 and co), for once.

 There is also the issue of the consequence on statically linking numpy
 to python: I don't what they are (I would actually like to make
 statically linked numpy into python easier, not harder).

 All the docs I can find in a quick google seem to say that bluegene
 doesn't do shared libraries at all, though those may be out of date.

 Also, it looks like our current approach is not doing a great job of
 avoiding symbol table pollution... despite all the NPY_NO_EXPORTS all
 over the source, I still count ~170 exported symbols on Linux with
 numpy 1.6, many of them with non-namespaced names
 (_n_to_n_data_copy, _next, npy_tan, etc.) Of course this is
 fixable, but it's interesting that no-one has noticed. (Current master
 brings this up to ~300 exported symbols.)

 It sounds like as far as our officially supported platforms go
 (linux/windows/osx with gcc/msvc), then the ideal approach would be to
 use -fvisibility=hidden or --retain-symbols-file to convince gcc to
 hide symbols by default, like msvc does. That would let us remove
 cruft from the source code, produce a more reliable result, and let us
 use the more convenient separate build, with no real downsides.

What cruft would it allow us to remove ? Whatever method we use, we
need a whitelist of symbols to export.

On the exported list I see on mac, most of them are either from
npymath (npy prefix) or npysort (no prefix, I think this should be
added). Once those are ignored as they should be, there are  30
symbols exported.

 (Static linking is trickier because no-one uses it anymore so the docs
 aren't great, but I think on Linux at least you could accomplish the
 equivalent by building the static library with 'ld -r ... -o
 tmp-multiarray.a; objcopy --keep-global-symbol=initmultiarray
 tmp-multiarray.a multiarray.a'.)

I am not sure why you say that static linking is not used anymore: I
have met some people who do statically link numpy into python.


 Of course there are presumably other platforms that we don't support
 or test on, but where we have users anyway. Building on such a
 platform sort of intrinsically requires build system hacks, and some
 equivalent to the above may well be available (e.g. I know icc
 supports -fvisibility). So I while I'm not going to do anything about
 this myself in the near future, I'd argue that it would be a good idea
 to:
  - Switch the build-system to export nothing by default when using
 gcc, using -fvisibility=hidden
  - Switch the default build to separate
  - Leave in the single-file build, but not officially supported,
 i.e., we're happy to get patches but it's not used on any systems that
 we can actually test ourselves. (I suspect it's less fragile than the
 separate build anyway, since name clashes are less common than
 forgotten include files.)

I am fine with making the separate build the default (I have a patch
somewhere that does that on supported platforms), but not with using
-fvisibility=hidden. When I implemented the initial support around
this, fvisibility was buggy on some platforms, including mingw 3.x

I don't think changing what our implementation does here is worthwhile
given that it works, and fsibility=hidden has no big advantages (you
would still need to mark the functions to be exported).

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combined versus separate build

2012-07-01 Thread David Cournapeau
On Sun, Jul 1, 2012 at 8:32 PM, Nathaniel Smith n...@pobox.com wrote:
 On Sun, Jul 1, 2012 at 7:36 PM, David Cournapeau courn...@gmail.com wrote:
 On Sun, Jul 1, 2012 at 6:36 PM, Nathaniel Smith n...@pobox.com wrote:
 On Wed, Jun 27, 2012 at 9:05 PM, David Cournapeau courn...@gmail.com 
 wrote:
 On Wed, Jun 27, 2012 at 8:53 PM, Nathaniel Smith n...@pobox.com wrote:
 But seriously, what compilers do we support that don't have
 -fvisibility=hidden? ...Is there even a list of compilers we support
 available anywhere?

 Well, I am not sure how all this is handled on the big guys (bluegen
 and co), for once.

 There is also the issue of the consequence on statically linking numpy
 to python: I don't what they are (I would actually like to make
 statically linked numpy into python easier, not harder).

 All the docs I can find in a quick google seem to say that bluegene
 doesn't do shared libraries at all, though those may be out of date.

 Also, it looks like our current approach is not doing a great job of
 avoiding symbol table pollution... despite all the NPY_NO_EXPORTS all
 over the source, I still count ~170 exported symbols on Linux with
 numpy 1.6, many of them with non-namespaced names
 (_n_to_n_data_copy, _next, npy_tan, etc.) Of course this is
 fixable, but it's interesting that no-one has noticed. (Current master
 brings this up to ~300 exported symbols.)

 It sounds like as far as our officially supported platforms go
 (linux/windows/osx with gcc/msvc), then the ideal approach would be to
 use -fvisibility=hidden or --retain-symbols-file to convince gcc to
 hide symbols by default, like msvc does. That would let us remove
 cruft from the source code, produce a more reliable result, and let us
 use the more convenient separate build, with no real downsides.

 What cruft would it allow us to remove ? Whatever method we use, we
 need a whitelist of symbols to export.

 No, right now we don't have a whitelist, we have a blacklist -- every
 time we add a new function or global variable, we have to remember to
 add a NPY_NO_EXPORT tag to its definition. Except the evidence says
 that we don't do that reliably. (Everyone always sucks at maintaining
 blacklists, that's the nature of blacklists.) I'm saying that we'd
 better off if we did have a whitelist. Especially since CPython API
 makes maintaining this whitelist so very trivial -- each module
 exports exactly one symbol!

There may be some confusion on what NPY_NP_EXPORT does: it marks a
function that can be used between compilation units but is not
exported. The choice is between static and NPY_NO_EXPORT, not between
NPY_NO_EXPORT and nothing. In that sense, marking something
NPY_NO_EXPORT is a whitelist.

If we were to use -fvisibility=hidden, we would still need to mark
those functions static (as it would otherwise publish functions in the
single file build).


 Yes, of course, or I wouldn't have bothered researching it. But this
 research would have been easier if there were enough of a user base
 that the tools makers actually paid any attention to supporting this
 use case, is all I was saying :-).

 Of course there are presumably other platforms that we don't support
 or test on, but where we have users anyway. Building on such a
 platform sort of intrinsically requires build system hacks, and some
 equivalent to the above may well be available (e.g. I know icc
 supports -fvisibility). So I while I'm not going to do anything about
 this myself in the near future, I'd argue that it would be a good idea
 to:
  - Switch the build-system to export nothing by default when using
 gcc, using -fvisibility=hidden
  - Switch the default build to separate
  - Leave in the single-file build, but not officially supported,
 i.e., we're happy to get patches but it's not used on any systems that
 we can actually test ourselves. (I suspect it's less fragile than the
 separate build anyway, since name clashes are less common than
 forgotten include files.)

 I am fine with making the separate build the default (I have a patch
 somewhere that does that on supported platforms), but not with using
 -fvisibility=hidden. When I implemented the initial support around
 this, fvisibility was buggy on some platforms, including mingw 3.x

 It's true that mingw doesn't support -fvisibility=hidden, but that's
 because it would be a no-op; windows already works that way by
 default...

That's not my understanding: gcc behaves on windows as on linux (it
would break too many softwares that are the usual target of mingw
otherwise), but the -fvisibility flag is broken on gcc 3.x. The more
recent mingw supposedly handle this better, but we can't use gcc 4.x
because of another issue regarding private dll sharing :)

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread David Cournapeau
On Mon, Jul 2, 2012 at 8:17 PM, Andrew Dalke da...@dalkescientific.com wrote:
 In this email I propose a few changes which I think are minor
 and which don't really affect the external NumPy API but which
 I think could improve the import numpy performance by at
 least 40%. This affects me because I and my clients use a
 chemistry toolkit which uses only NumPy arrays, and where
 we run short programs often on the command-line.


 In July of 2008 I started a thread about how import numpy
 was noticeably slow for one of my customers. They had
 chemical analysis software, often even run on a single
 molecular structure using command-line tools, and the
 several invocations with 0.1 seconds overhead was one of
 the dominant costs even when numpy wasn't needed.

 I fixed most of their problems by deferring numpy imports
 until needed. I remember well the Steve Jobs anecdote at
   http://folklore.org/StoryView.py?project=Macintoshstory=Saving_Lives.txt
 and spent another day of my time in 2008 to identify the
 parts of the numpy import sequence which seemed excessive.
 I managed to get the import time down from 0.21 seconds to
 0.08 seconds.

I will answer to your other remarks later, but 0.21 sec to import
numpy is very slow, especially on a recent computer. It is 0.095 sec
on my mac, and 0.075 sec on a linux VM on the same computer (both hot
cache of course).

importing multiarray.so only is negligible for me (i.e. difference
between python -c import multiarray and python -c  is
statistically insignificant).

I would check external factors, like the size of your sys.path as well.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] import numpy performance

2012-07-02 Thread David Cournapeau
On Mon, Jul 2, 2012 at 11:15 PM, Andrew Dalke da...@dalkescientific.com wrote:
 On Jul 2, 2012, at 11:38 PM, Fernando Perez wrote:
 No, that's the wrong thing to test, because it effectively amounts to
 'import numpy', sicne the numpy __init__ file is still executed.  As
 David indicated, you must import multarray.so by itself.

 I understand that clarification. However, it does not affect me.

It is indeed irrelevant to your end goal, but it does affect the
interpretation of what import_array does, and thus of your benchmark

polynomial is definitely the big new overhead (I don't remember it
being significant last time I optimized numpy import times), it is
roughly 30 % of the total cost of importing numpy (95 - 70 ms total
time, of which numpy went from 70 to 50 ms). Then ctypeslib and test
are the two other significant ones.

I use profile_imports.py from bzr as follows:

import sys
import profile_imports
profile_imports.install()
import numpy
profile_imports.log_stack_info(sys.stdout)

Focusing on polynomial seems the only sensible action. Except for
test, all the other stuff seem difficult to change without breaking
anything.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Combined versus separate build

2012-07-02 Thread David Cournapeau
On Mon, Jul 2, 2012 at 11:34 PM, Nathaniel Smith n...@pobox.com wrote:


 To be clear, this subthread started with the caveat *as far as our
 officially supported platforms go* -- I'm not saying that we should
 go around and remove all the NPY_NO_EXPORT macros tomorrow.

 However, the only reason they're actually needed is for supporting
 platforms where you can't control symbol visibility from the linker,
 and AFAICT we have no examples of such platforms to hand.

I gave you one, mingw 3.x. Actually, reading a bit more around, it
seems this is not specific to mingw, but all gcc  4
(http://gcc.gnu.org/gcc-4.0/changes.html#visibility)

 I don't have windows to test, but everyone else on the internet seems
 to think mingw works the way I said, with __declspec and all... you
 aren't thinking of cygwin, are you? (see e.g.
 http://mingw.org/wiki/sampleDLL)

Well, I did check myself, but looking more into it, I was tricked by
nm output, which makes little sense on windows w.r.t. visibility with
dll. You can define the same function in multiple dll, they will all
appear as a public symbol (T label with nm), but the windows linker
will not see them when linking for an executable.

I am still biased toward the conservative option, especially that it
is still followed by pretty much every C extension out there
(including python itself). I trust their experience in dealing with
cross platform more than ours.

I cannot find my patch for detecting platforms where this can safely
become the default, I will reprepare one.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Code Freeze for NumPy 1.7

2012-07-15 Thread David Cournapeau
On Sun, Jul 15, 2012 at 5:42 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Sun, Jul 15, 2012 at 10:32 AM, Ralf Gommers ralf.gomm...@googlemail.com
 wrote:



 On Sun, Jul 15, 2012 at 5:57 PM, Nathaniel Smith n...@pobox.com wrote:

 On Sun, Jul 15, 2012 at 1:08 PM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
 
 
  On Sun, Jul 15, 2012 at 12:45 AM, Travis Oliphant tra...@continuum.io
  wrote:
 
 
  Hey all,
 
  We are nearing a code-freeze for NumPy 1.7.   Are there any
  last-minute
  changes people are wanting to push into NumPy 1.7?  We should discuss
  them
  as soon as possible.
 
  I'm proposing a code-freeze at midnight UTC on July 18th (7:00pm CDT
  on
  July 17th).   This will allow the creation of beta releases of NumPy
  on the
  18th of July. This is a few days later than originally hoped for ---
  largely
  due to unexpected travel schedules of Ondrej and I, but it does give
  people
  a few more days to get patches in.  Of course, we will be able to
  apply
  bug-fixes to the 1.7.x branch once the tag is made.
 
 
  What about the tickets still open for 1.7.0
  (http://projects.scipy.org/numpy/report/3)? There are a few important
  ones
  left.
 
  These I would consider blockers:
- #2108 Datetime failures with MinGW

 Is there a description anywhere of what the problem actually is here?
 I looked at the ticket, which referred to a PR, and it's hard to work
 out from the PR discussion what the actual remaining test failures are
 -- and there definitely doesn't seem to be any description of the
 underlying problem. (Something about working 64-bit time_t on windows
 being difficult depending on the compiler used?)


 There's a lot more discussion on
 http://projects.scipy.org/numpy/ticket/1909
 https://github.com/numpy/numpy/pull/156
 https://github.com/numpy/numpy/pull/161.

 The issue is that for MinGW 3.x some _s / _t functions seem to be missing.
 And we don't yet support MinGW 4.x.

 Current issues can be seen from the last test log on our Windows XP
 buildbot (June 29,
 http://buildbot.scipy.org/builders/Windows_XP_x86/builds/1124/steps/shell_1/logs/stdio):

 ==
 ERROR: test_datetime_arange (test_datetime.TestDateTime)
 --
 Traceback (most recent call last):
   File
 C:\buildbot\numpy\b11\numpy-install\Lib\site-packages\numpy\core\tests\test_datetime.py,
 line 1351, in test_datetime_arange
 assert_raises(ValueError, np.arange, np.datetime64('today'),
 OSError: Failed to use '_localtime64_s' to convert to a local time

 ==
 ERROR: test_datetime_y2038 (test_datetime.TestDateTime)
 --
 Traceback (most recent call last):
   File
 C:\buildbot\numpy\b11\numpy-install\Lib\site-packages\numpy\core\tests\test_datetime.py,
 line 1706, in test_datetime_y2038
 a = np.datetime64('2038-01-20T13:21:14')
 OSError: Failed to use '_gmtime64_s' to convert to a UTC time

 ==
 ERROR: test_pydatetime_creation (test_datetime.TestDateTime)
 --
 Traceback (most recent call last):
   File
 C:\buildbot\numpy\b11\numpy-install\Lib\site-packages\numpy\core\tests\test_datetime.py,
 line 467, in test_pydatetime_creation
 a = np.array(['today', datetime.date.today()], dtype='M8[D]')
 OSError: Failed to use '_localtime64_s' to convert to a local time

 ==
 ERROR: test_string_parser_variants (test_datetime.TestDateTime)
 --
 Traceback (most recent call last):
   File
 C:\buildbot\numpy\b11\numpy-install\Lib\site-packages\numpy\core\tests\test_datetime.py,
 line 1054, in test_string_parser_variants
 assert_equal(np.array(['1980-02-29T01:02:03'], np.dtype('M8[s]')),
 OSError: Failed to use '_gmtime64_s' to convert to a UTC time

 ==
 ERROR: test_timedelta_scalar_construction_units
 (test_datetime.TestDateTime)
 --
 Traceback (most recent call last):
   File
 C:\buildbot\numpy\b11\numpy-install\Lib\site-packages\numpy\core\tests\test_datetime.py,
 line 287, in test_timedelta_scalar_construction_units
 assert_equal(np.datetime64('2010-03-12T17').dtype,
 OSError: Failed to use '_gmtime64_s' to convert to a UTC time

 ==
 ERROR: Failure: OSError (Failed to use '_gmtime64_s' to convert to a UTC
 time)
 --
 Traceback (most recent call last):
   File 

Re: [Numpy-discussion] Lazy imports again

2012-07-17 Thread David Cournapeau
On Mon, Jul 16, 2012 at 5:28 PM, Charles R Harris
charlesr.har...@gmail.com wrote:
 Hi All,

 Working lazy imports would be useful to have. Ralf is opposed to the idea
 because it caused all sorts of problems on different platforms when it was
 tried in scipy. I thought I'd open the topic for discussion so that folks
 who had various problems/solutions could offer input and the common
 experience could be collected in one place. Perhaps there is a solution that
 actually works.

I have never seen a lazy import system that did not cause issues in
one way or the other. Lazy imports make a lot of sense for an
application (e.g. mercurial), but I think it is a mistake to solve
this at the numpy level.

This should be solved at the application level, and there are
solutions for that. For example, using the demandimport code from
mercurial (GPL) cuts down the numpy import time by 3 on my mac if one
uses np.zeros (100ms - 50 ms, of which 25 are taken by python
itself):


import demandimport
demandimport.enable()

import numpy as np

a = np.zeros(10)


To help people who need fast numpy imports, I would suggest the
following course of actions:
   - start benchmarking numpy import in a per-commit manner to detect
significant regressions (like what happens with polynomial code)
   - have a small FAQ on it, with suggestion for people who need to
optimize their short-lived script

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Lazy imports again

2012-07-17 Thread David Cournapeau
On Tue, Jul 17, 2012 at 1:13 PM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Tue, Jul 17, 2012 at 1:31 AM, David Cournapeau courn...@gmail.com
 wrote:

 On Mon, Jul 16, 2012 at 5:28 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
  Hi All,
 
  Working lazy imports would be useful to have. Ralf is opposed to the
  idea
  because it caused all sorts of problems on different platforms when it
  was
  tried in scipy. I thought I'd open the topic for discussion so that
  folks
  who had various problems/solutions could offer input and the common
  experience could be collected in one place. Perhaps there is a solution
  that
  actually works.

 I have never seen a lazy import system that did not cause issues in
 one way or the other. Lazy imports make a lot of sense for an
 application (e.g. mercurial), but I think it is a mistake to solve
 this at the numpy level.

 This should be solved at the application level, and there are
 solutions for that. For example, using the demandimport code from
 mercurial (GPL) cuts down the numpy import time by 3 on my mac if one
 uses np.zeros (100ms - 50 ms, of which 25 are taken by python
 itself):

 
 import demandimport
 demandimport.enable()

 import numpy as np

 a = np.zeros(10)
 

 To help people who need fast numpy imports, I would suggest the
 following course of actions:
- start benchmarking numpy import in a per-commit manner to detect
 significant regressions (like what happens with polynomial code)
- have a small FAQ on it, with suggestion for people who need to
 optimize their short-lived script


 That's really interesting. I'd like to see some folks try that solution.

Anyone can:) the file is self-contained last time I checked:
http://www.selenic.com/hg/file/67b8cca2f12b/mercurial/demandimport.py

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Symbol table not found compiling numpy from git repository on Windows

2012-07-18 Thread David Cournapeau
On Wed, Jul 18, 2012 at 11:38 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote:
 On Wed, Jul 18, 2012 at 12:30 PM, Ondřej Čertík ondrej.cer...@gmail.com 
 wrote:
 On Wed, Jul 18, 2012 at 2:20 AM, Ondřej Čertík ondrej.cer...@gmail.com 
 wrote:
 On Thu, Jan 5, 2012 at 8:22 PM, John Salvatier
 jsalv...@u.washington.edu wrote:
 Hello,

 I'm trying to compile numpy on Windows 7 using the command: python 
 setup.py
 config --compiler=mingw32 build but I get an error about a symbol table 
 not
 found. Anyone know how to work around this or what to look into?

 building library npymath sources
 Building msvcr library: C:\Python26\libs\libmsvcr90.a (from
 C:\Windows\winsxs\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.21022.8_none_750b37ff97f4f68b\msvcr90.dll)
 objdump.exe:
 C:\Windows\winsxs\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.21022.8_none_750b37ff97f4f68b\msvcr90.dll:
 File format not recognized
 Traceback (most recent call last):
   File setup.py, line 214, in module
 setup_package()
   File setup.py, line 207, in setup_package
 configuration=configuration )
   File C:\Users\jsalvatier\workspace\numpy\numpy\distutils\core.py, line
 186, in setup
 return old_setup(**new_attr)
   File C:\Python26\lib\distutils\core.py, line 152, in setup
 dist.run_commands()
   File C:\Python26\lib\distutils\dist.py, line 975, in run_commands
 self.run_command(cmd)
   File C:\Python26\lib\distutils\dist.py, line 995, in run_command
 cmd_obj.run()
   File
 C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build.py, 
 line
 37, in run
 old_build.run(self)
   File C:\Python26\lib\distutils\command\build.py, line 134, in run
 self.run_command(cmd_name)
   File C:\Python26\lib\distutils\cmd.py, line 333, in run_command
 self.distribution.run_command(command)
   File C:\Python26\lib\distutils\dist.py, line 995, in run_command
 cmd_obj.run()
   File
 C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py,
 line 152, in run
 self.build_sources()
   File
 C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py,
 line 163, in build_sources
 self.build_library_sources(*libname_info)
   File
 C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py,
 line 298, in build_library_sources
 sources = self.generate_sources(sources, (lib_name, build_info))
   File
 C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py,
 line 385, in generate_sources
 source = func(extension, build_dir)
   File numpy\core\setup.py, line 646, in get_mathlib_info
 st = config_cmd.try_link('int main(void) { return 0;}')
   File C:\Python26\lib\distutils\command\config.py, line 257, in try_link
 self._check_compiler()
   File
 C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\config.py,
 line 45, in _check_compiler
 old_config._check_compiler(self)
   File C:\Python26\lib\distutils\command\config.py, line 107, in
 _check_compiler
 dry_run=self.dry_run, force=1)
   File C:\Users\jsalvatier\workspace\numpy\numpy\distutils\ccompiler.py,
 line 560, in new_compiler
 compiler = klass(None, dry_run, force)
   File
 C:\Users\jsalvatier\workspace\numpy\numpy\distutils\mingw32ccompiler.py,
 line 94, in __init__
 msvcr_success = build_msvcr_library()
   File
 C:\Users\jsalvatier\workspace\numpy\numpy\distutils\mingw32ccompiler.py,
 line 362, in build_msvcr_library
 generate_def(dll_file, def_file)
   File
 C:\Users\jsalvatier\workspace\numpy\numpy\distutils\mingw32ccompiler.py,
 line 282, in generate_def
 raise ValueError(Symbol table not found)
 ValueError: Symbol table not found


 Did you find a workaround? I am having exactly the same problem.

 So this happens both in Windows and in Wine and the problem is that
 the numpy distutils is trying to read the symbol table using objdump
 from msvcr90.dll but it can't recognize the format:

 objdump.exe: 
 C:\windows\winsxs\x86_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.4148_none_deadbeef\msvcr90.dll:
 File format not recognized

 The file exists:


 $ file 
 ~/.wine/drive_c/windows/winsxs/x86_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.4148_none_deadbeef/msvcr90.dll
 /home/ondrej/.wine/drive_c/windows/winsxs/x86_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.4148_none_deadbeef/msvcr90.dll:
 PE32 executable for MS Windows (DLL) (unknown subsystem) Intel 80386
 32-bit


 But objdump doesn't work on it.

 So the following patch fixes it:


 diff --git a/numpy/distutils/mingw32ccompiler.py 
 b/numpy/distutils/mingw32ccompi
 index 5b9aa33..72ff5ed 100644
 --- a/numpy/distutils/mingw32ccompiler.py
 +++ b/numpy/distutils/mingw32ccompiler.py
 @@ -91,11 +91,11 @@ class 
 Mingw32CCompiler(distutils.cygwinccompiler.CygwinCComp
  build_import_library()

  # Check for custom msvc runtime library on Windows. Build if it 
 doesn't
 -msvcr_success = build_msvcr_library()
 -msvcr_dbg_success = build_msvcr_library(debug=True)
 -

Re: [Numpy-discussion] Segfault in mingw in test_arrayprint.TestComplexArray

2012-07-20 Thread David Cournapeau
On Fri, Jul 20, 2012 at 12:24 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote:
 So I have tried the MinGW-5.0.3.exe in Wine, but it tries to install
 from some wrong url and it fails to install.
 I have unpacked the tarballs by hand into ~/.wine/drive_c/MinGW:

 Not surprising, that MinGW is really getting old. It's still the last
 available one with gcc 3.x as IIRC.

 To make things reproducible, I've put all my packages in this repository:

 https://github.com/certik/numpy-vendor



 binutils-2.17.50-20070129-1.tar.gz
 w32api-3.7.tar.gz
 gcc-g77-3.4.5-20051220-1.tar.gz
 gcc-g++-3.4.5-20051220-1.tar.gz
 gcc-core-3.4.5-20051220-1.tar.gz
 mingw-runtime-3.10.tar.gz

 also in the same directory, I had to do:

 cp ../windows/system32/msvcr90.dll lib/


 Looks like I have an older Wine, not sure if it makes a difference:

 $ locate msvcr90.dll
 /Users/rgommers/.wine/drive_c/windows/winsxs/x86_Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0.21022.8_x-ww_d08d0375/msvcr90.dll
 /Users/rgommers/__wine/drive_c/windows/winsxs/x86_Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0.21022.8_x-ww_d08d0375/msvcr90.dll

 $ locate msvcr71.dll
 /Users/rgommers/.wine/drive_c/windows/system32/msvcr71.dll
 /Users/rgommers/Code/wine/dlls/msvcr71/msvcr71.dll.fake
 /Users/rgommers/Code/wine/dlls/msvcr71/msvcr71.dll.so
 /Users/rgommers/__wine/drive_c/windows/system32/msvcr71.dll
 /Users/rgommers/wine/build/wine-1.1.39/dlls/msvcr71/msvcr71.dll.fake
 /Users/rgommers/wine/build/wine-1.1.39/dlls/msvcr71/msvcr71.dll.so
 /Users/rgommers/wine/wine-1.1.39/lib/wine/fakedlls/msvcr71.dll
 /Users/rgommers/wine/wine-1.1.39/lib/wine/msvcr71.dll.so
 /usr/local/lib/wine/fakedlls/msvcr71.dll
 /usr/local/lib/wine/msvcr71.dll.so

 Actually, I made a mistake --- the one in
 drive_c/windows/system32/msvcr90.dll does not work for me.
 The one I use is installed by the Python installer (as I found out)
 and it is in:

 drive_c/windows/winsxs/x86_Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0.21022.8_x-ww_d08d0375/msvcr90.dll

 Which seems to be the same as the one that you use. Just in case, I've
 put it here:

 https://github.com/certik/numpy-vendor/blob/master/msvcr90.dll





 Also I've added the bin directory to PATH using the following trick:

 $ cat  tmp EOF
 REGEDIT4

 [HKEY_CURRENT_USER\Environment]
 PATH=C:MinGWbin
 EOF
 $ wine regedit tmp



 Then I build and installed numpy using:

 wine C:\Python27\python setup.py build --compiler=mingw32 install

 And now there is no segfault when constructing a complex array! So
 newer (newest) mingw miscompiles NumPy somehow...


 Anyway, running tests, it gets much farther then before, now it hangs at:


 test_multiarray.TestIO.test_ascii ...
 err:ntdll:RtlpWaitForCriticalSection section 0x785b7428 ? wait timed
 out in thread 0009, blocked by , retrying (60 sec)
 fixme:keyboard:X11DRV_ActivateKeyboardLayout 0x4090409, : semi-stub!
 err:ntdll:RtlpWaitForCriticalSection section 0x785b7428 ? wait timed
 out in thread 0009, blocked by , retrying (60 sec)
 err:ntdll:RtlpWaitForCriticalSection section 0x785b7428 ? wait timed
 out in thread 0009, blocked by , retrying (60 sec)
 ...

 Not sure what this problem is yet.


 This however is a big problem. I've tested it on the actual Windows
 64bit XP box, and the test simply segfaults at this place.
 Ralf, I should note, that your latest scipy RC tests also segfault on
 my Windows machine, so maybe something is wrong with the machine...

I have some good news for numpy, but bad news for you :)
  -  first, building numpy and testing mostly work for me (tried the
last commit from 1.7.x branch) with mingw 5.0.4 with python 2.7.3 and
*without* any change in the code (i.e. I did not commented out the
part to build msgcr90 import library).
  - I don't know what the issue is in your environment for msvc90, but
I can confirm that it is required. gcc 3.x which was built around
2005/2006 cannot possibly provide the import library for msvcr90, and
the build works ok
  - I strongly suspect some issues because you started with mingw /
gcc 4.x. If you moved some libraries in system directories, I suggest
you start fresh from a clean state in your VM (or rm -rf .wine :) ).

I noticed that when VS 2008 is available, distutils does the
configuration with MS compilers, which is broken. I will test later on
a machine wo vs 2008.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Segfault in mingw in test_arrayprint.TestComplexArray

2012-07-20 Thread David Cournapeau
On Thu, Jul 19, 2012 at 4:58 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote:


 So I have tried the MinGW-5.0.3.exe in Wine, but it tries to install
 from some wrong url and it fails to install.
 I have unpacked the tarballs by hand into ~/.wine/drive_c/MinGW:

 binutils-2.17.50-20070129-1.tar.gz
 w32api-3.7.tar.gz
 gcc-g77-3.4.5-20051220-1.tar.gz
 gcc-g++-3.4.5-20051220-1.tar.gz
 gcc-core-3.4.5-20051220-1.tar.gz
 mingw-runtime-3.10.tar.gz

 also in the same directory, I had to do:

 cp ../windows/system32/msvcr90.dll lib/

I think that's your problem right there. You should not need to do
that, and doing so will likely result in having multiple copies of the
DLL in your process (you can confirm with process dependency walker).
This should be avoided at all cost, as the python C API is not
designed to deal with this, and your crashes are pretty typical of
what happens in those cases.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Status of NumPy and Python 3.3

2012-07-27 Thread David Cournapeau
On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant tra...@continuum.io wrote:
 Hey all,

 I'm wondering who has tried to make NumPy work with Python 3.3.   The Unicode 
 handling was significantly improved in Python 3.3 and the array-scalar code 
 (which assumed a certain structure for UnicodeObjects) is not working now.

 It would be nice to get 1.7.0 working with Python 3.3 if possible before the 
 release. Anyone interested in tackling that little challenge?   If 
 someone has already tried it would be nice to hear your experience.

Given that we're late with 1.7, I would suggest passing this to the
next release, unless the fix is simple (just a change of API).

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Status of NumPy and Python 3.3

2012-07-27 Thread David Cournapeau
On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau courn...@gmail.com wrote:
 On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant tra...@continuum.io wrote:
 Hey all,

 I'm wondering who has tried to make NumPy work with Python 3.3.   The 
 Unicode handling was significantly improved in Python 3.3 and the 
 array-scalar code (which assumed a certain structure for UnicodeObjects) is 
 not working now.

 It would be nice to get 1.7.0 working with Python 3.3 if possible before the 
 release. Anyone interested in tackling that little challenge?   If 
 someone has already tried it would be nice to hear your experience.

 Given that we're late with 1.7, I would suggest passing this to the
 next release, unless the fix is simple (just a change of API).

I took a brief look at it, and from the errors I have seen, one is
cosmetic, the other one is a bit more involved (rewriting
PyArray_Scalar unicode support). While it is not difficult in nature,
the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it
would require multiple configurations on multiple python versions to
be tested.

I don't think python 3.3 support is critical - people who want to play
with bet interpreters can build numpy by themselves from master, so I
am -1 on integrating this into 1.7.

I may have a fix within tonight for it, though,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Moving away from using accelerate framework on mac os x ?

2012-08-04 Thread David Cournapeau
Hi,

During last PyCon, Olivier Grisel (from scikits-learn fame) and myself
looked into a nasty bug on mac os x: https://gist.github.com/2027412.
The short story is that I believe this means numpy cannot be used with
multiprocessing if linked against accelerate framework, and as such we
should think about giving up on accelerate, and use e.g. ATLAS on mac
for our official binaries.

Long story: we recently received a answer where the engineers mention
that using blas on each 'side' of a fork is not supported. The meat of
the email is attached below

thoughts ?

David

-- Forwarded message --
From:  devb...@apple.com
Date: 2012/8/2
Subject: Bug ID 11036478: Segfault when calling dgemm with Accelerate
/ GCD after in a forked process
To: olivier.gri...@gmail.com


Hi Olivier,

Thank you for contacting us regarding Bug ID# 11036478.

Thank you for filing this bug report.

This usage of fork() is not supported on our platform.

For API outside of POSIX, including GCD and technologies like
Accelerate, we do not support usage on both sides of a fork(). For
this reason among others, use of fork() without exec is discouraged in
general in processes that use layers above POSIX.

We recommend that you either restrict usage of blas to the parent or
the child process but not both, or that you switch to using GCD or
pthreads rather than forking to create parallelism.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Moving away from using accelerate framework on mac os x ?

2012-08-04 Thread David Cournapeau
On Sat, Aug 4, 2012 at 12:14 PM, Aron Ahmadia a...@ahmadia.net wrote:
 Hi David,

 Apple's response here is somewhat confusing, but I will add that on the
 supercomputing side of things we rarely fork, as this is not well-supported
 from the vendors or the hardware (it's hard enough to performantly spawn
 500,000 processes statically, doing this dynamically becomes even more
 challenging).  This sounds like an issue in Python multiprocessing itself,
 as I guess many other Apple libraries will fail or crash with the
 fork-no-exec model.

 My suggestion would be that numpy continue to integrate with Accelerate but
 prefer a macports or brew supplied blas, if available.  This should probably
 also be filed as a wont-fix bug on the tracker so anybody who hits the same
 problem knows that it's on the system side and not us.

To be clear, I am not suggesting removing support for linking against
accelerate, just to go away from it for our binary releases.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Unicode revisited

2012-08-04 Thread David Cournapeau
On Sat, Aug 4, 2012 at 12:58 PM, Stefan Krah stefan-use...@bytereef.org wrote:
 Nathaniel Smith n...@pobox.com wrote:
 On Sat, Aug 4, 2012 at 11:42 AM, Stefan Krah stefan-use...@bytereef.org 
 wrote:
  switch (descr-byteorder) {
  case '':
  byteorder = -1;
  case '':
  byteorder = 1;
  default: /* '=', '|' */
  byteorder = 0;
  }

 I think you might want some breaks in here...

 Indeed. Shame on me for posting quick-and-dirty code.

Maybe we should unit-testing our email too :)

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] building numpy 1.6.2 on OSX 10.6 / Python2.7.3

2012-08-08 Thread David Cournapeau
On Wed, Aug 8, 2012 at 6:15 AM, Andrew Nelson andyf...@gmail.com wrote:
 Dear Pierre,
 as indicated yesterday OSX system python is in:

 /System/Library/Frameworks/Python.framework/

 I am installing into:

 /Library/Frameworks/Python.framework/Versions/Current/lib/python2.7/site-packages

 This should not present a problem and does not explain why numpy does not
 build/import correctly on my setup.

Please give us the build log (when rebuilding from scratch to have the
complete log) so that we can have a better idea of the issue,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Licensing question

2012-08-08 Thread David Cournapeau
On Wed, Aug 8, 2012 at 12:55 AM, Nathaniel Smith n...@pobox.com wrote:
 On Mon, Aug 6, 2012 at 8:31 PM, Robert Kern robert.k...@gmail.com wrote:
 Those are not the original Fortran sources. The original Fortran sources are
 in the public domain as work done by a US federal employee.

 http://www.netlib.org/fftpack/

 Never trust the license of any code on John Burkardt's site. Track it down
 to the original sources.

 Taken together, what those websites seem to be claiming is that you
 have a choice of buggy BSD code or fixed GPL code? I assume someone
 has already taken the appropriate measures for numpy, but it seems
 like an unfortunate situation...

If the code on John Burkardt website is based on the netlib codebase,
he is not entitled to make it GPL unless he is the sole copyright
holder of the original code.

I think the 'real' solution is to have a separate package linking to
FFTW for people with 'advanced' needs for FFT. None of the other
library I have looked at so far are usable, fast and precise enough
when you go far from the simple case of double precision and 'well
factored' size.

regards,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Licensing question

2012-08-08 Thread David Cournapeau
On Wed, Aug 8, 2012 at 10:53 AM, Robert Kern robert.k...@gmail.com wrote:
 On Wed, Aug 8, 2012 at 10:34 AM, David Cournapeau courn...@gmail.com wrote:
 On Wed, Aug 8, 2012 at 12:55 AM, Nathaniel Smith n...@pobox.com wrote:
 On Mon, Aug 6, 2012 at 8:31 PM, Robert Kern robert.k...@gmail.com wrote:
 Those are not the original Fortran sources. The original Fortran sources 
 are
 in the public domain as work done by a US federal employee.

 http://www.netlib.org/fftpack/

 Never trust the license of any code on John Burkardt's site. Track it down
 to the original sources.

 Taken together, what those websites seem to be claiming is that you
 have a choice of buggy BSD code or fixed GPL code? I assume someone
 has already taken the appropriate measures for numpy, but it seems
 like an unfortunate situation...

 If the code on John Burkardt website is based on the netlib codebase,
 he is not entitled to make it GPL unless he is the sole copyright
 holder of the original code.

 He can certainly incorporate the public domain code and rerelease it
 under whatever restrictions he likes, especially if he adds to it,
 which appears to be the case. The original sources are legitimately
 public domain, not just released under a liberal copyright license. He
 can't remove the original code from the public domain, but that's
 not what he claims to have done.

 I think the 'real' solution is to have a separate package linking to
 FFTW for people with 'advanced' needs for FFT. None of the other
 library I have looked at so far are usable, fast and precise enough
 when you go far from the simple case of double precision and 'well
 factored' size.

 http://pypi.python.org/pypi/pyFFTW

Nice, I am starting to get out of touch with too many packages...
Would be nice to add DCT and DST support to it.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vagrant VM for building NumPy (1.7.x) Windows binaries

2012-08-13 Thread David Cournapeau
Hi Ondrej,

On Mon, Aug 13, 2012 at 5:13 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote:
 Hi,

 I've created this repository:

 https://github.com/certik/numpy-vendor

 which uses Vagrant and Fabric to fully automate the setup  creation
 of NumPy binaries for Windows. The setup is especially tricky,
 I've thought several times already that I nailed it, and then always
 new things pop up. One can of course install things directly in
 Ubuntu, but it's tricky, there are a lot of things that can go wrong.
 The above approach should be 100% reproducible. So hopefully this
 repository will be useful
 for somebody new (like I am) to numpy releases. Also my hope is that
 more people can help out with the release just by running it on their
 machines and/or sending PRs against this repository.

Thanks for doing this. I think vagrant is the way to go. I myself have
some stuff for native windows and vagrant (much more painful, but
sometimes necessary unfortunately).

Did you see veewee to create vagrant boxes ? It simplifies quite a few
things, but maybe they matter more on windows than on linux, where
this kind of things is much simpler.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to use numpy-vendor

2012-08-14 Thread David Cournapeau
Hi Ondrej,

On Tue, Aug 14, 2012 at 5:34 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote:
 Hi,

 How should one use the vendor repository (https://github.com/numpy/vendor)
 in Wine? Should I put the binaries into .wine/drive_c/Python25/libs/,
 or somewhere else?
 I've search all mailinglists and I didn't find any information on it.
 I vaguely remember
 that somebody mentioned it somewhere, but I am not able to find it.
 Once I understand it,
 I'll send a PR updating the README.

There is no information on vendor: that's a repo I set up to avoid
polluting the main repo with all the binary stuff that used to be in
SVN. The principle is to put binaries used to *build* numpy, but we
don't put anything there for end-users.

What binaries do you need to put there ? Numpy binaries are usually
put on sourceforge (although I would be more than happy to have a
suggestion for a better way because uploading on sourceforge is the
very definition of pain).

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to use numpy-vendor

2012-08-14 Thread David Cournapeau
On Tue, Aug 14, 2012 at 11:22 AM, Nathaniel Smith n...@pobox.com wrote:
 On Tue, Aug 14, 2012 at 11:06 AM, David Cournapeau courn...@gmail.com wrote:
 Hi Ondrej,

 On Tue, Aug 14, 2012 at 5:34 AM, Ondřej Čertík ondrej.cer...@gmail.com 
 wrote:
 Hi,

 How should one use the vendor repository (https://github.com/numpy/vendor)
 in Wine? Should I put the binaries into .wine/drive_c/Python25/libs/,
 or somewhere else?
 I've search all mailinglists and I didn't find any information on it.
 I vaguely remember
 that somebody mentioned it somewhere, but I am not able to find it.
 Once I understand it,
 I'll send a PR updating the README.

 There is no information on vendor: that's a repo I set up to avoid
 polluting the main repo with all the binary stuff that used to be in
 SVN. The principle is to put binaries used to *build* numpy, but we
 don't put anything there for end-users.

 What binaries do you need to put there ? Numpy binaries are usually
 put on sourceforge (although I would be more than happy to have a
 suggestion for a better way because uploading on sourceforge is the
 very definition of pain).

 I think he's asking how to use the binaries in numpy-vendor to build a
 release version of numpy.

Hm, good point, I don't know why I read putting .wine stuff into
vendor instead of the opposite.

Anyway, the way to use the binaries is to put them in some known
location, e.g. C:\local ($WINEPREFIX/drive_c/local for wine), and copy
the nosse/sse2/sse3 directories in there. For example:

C:\local\lib\yop\nosse
C:\local\lib\yop\sse2
...

This is then referred through env by the pavement script (see
https://github.com/numpy/numpy/blob/master/pavement.py#L143). Renaming
yop to atlas would be a good idea, don't know why I let that
non-descriptive name in there.

Manually, you can just do something like ATLAS=C:\local\lib\yop\sse2
python setup.py build, but being careful about how env variables are
passed between shell and wine (don't remember the details). Note that
the nosse is not ATLAS, but straight netlib libs, which is why in that
case you need to use BLAS=... LAPACK=...

I would strongly suggest not to use openblas for this release, because
of all the issues related to CPU tuning. We could certainly update a
bit what we have in there, but building windows binaries is big enough
of a pain, that you don't want to do everything at once I think,
especially testing/building blas on windows is very time consuming.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Preventing lossy cast for new float dtypes ?

2012-08-18 Thread David Cournapeau
Hi,

I have started toying with implementing a quad precision dtype for
numpy on supported platforms, using the __float128 + quadmath lib from
gcc. I have noticed invalid (and unexpected) downcast to long double
in some cases, especially for ufuncs (e.g. when I don't define my own
ufunc for a given operation).

Looking down in numpy ufunc machinery, I can see that the issue is
coming from the assumption that long double is the highest precision
possible for a float type, and the only way I can 'fix' this is to
define kind to a value != 'f' in my dtype definition (in which case I
get an expected invalid cast exception). Is there a way to still avoid
those casts while keeping the 'f' kind ?

thanks,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64bit infrastructure

2012-08-22 Thread David Cournapeau
On Tue, Aug 21, 2012 at 12:15 AM, Chris Barker chris.bar...@noaa.gov wrote:
 On Mon, Aug 20, 2012 at 3:51 PM, Travis Oliphant tra...@continuum.io wrote:
 I'm actually not sure, why.   I think the issue is making sure that the 
 release manager can actually build NumPy without having to buy a 
 particular compiler.

 The MS Express editions, while not open source, are free-to-use, and work 
 fine.

 Not sure what what do about Fortran, though, but that's a scipy, not a
 numpy issue, yes?

fortran is the issue. Having one or two licenses of say Intel Fortran
compiler is not enough because it makes it difficult for people to
build on top of scipy.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


  1   2   3   4   5   6   7   8   9   10   >