Re: [Numpy-discussion] Failure to build numpy 1.6.1
On Tue, Nov 8, 2011 at 9:01 AM, David Cournapeau courn...@gmail.com wrote: Hi Mads, On Tue, Nov 8, 2011 at 8:40 AM, Mads Ipsen madsip...@gmail.com wrote: Hi, I am trying to build numpy-1.6.1 with the following gcc compiler specs: Reading specs from /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-java-awt=gtk --host=x86_64-redhat-linux Thread model: posix gcc version 3.4.6 20060404 (Red Hat 3.4.6-11) I get the following error (any clues at what goes wrong)? This looks like a compiler bug (gcc 3.4 is really old). einsum uses SSE intrinsics, and old gcc implementations are quite buggy in that area. Could you try the following, at line 38, to add the following: #define EINSUM_USE_SSE1 0 #define EINSUM_USE_SSE2 0 I meant to add this in the file numpy/core/src/multiarray/einsum.c.src, and then rebuild numpy David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Failure to build numpy 1.6.1
On Tue, Nov 8, 2011 at 9:20 AM, Mads Ipsen madsip...@gmail.com wrote: Yup, that fixes it. For now, we can apply a temporary fix on our build system. Is this something that'll go into, say, 1.6.2? That's more of a workaround than a fix. We need to decide whether we disable intrinsics altogether or wether we want to drop support for old compilers. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Memory hungry reduce ops in Numpy
On Mon, Nov 14, 2011 at 12:46 PM, Andreas Müller amuel...@ais.uni-bonn.de wrote: Hi everybody. When I did some normalization using numpy, I noticed that numpy.std uses more ram than I was expecting. A quick google search gave me this: http://luispedro.org/software/ncreduce The site claims that std and other reduce operations are implemented naively with many temporaries. Is that true? And if so, is there a particular reason for that? This issues seems quite easy to fix. In particular the link I gave above provides code. The code provided only implements a few special cases: being more efficient in those cases only is indeed easy. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Odd-looking long double on windows 32 bit
On Mon, Nov 14, 2011 at 9:01 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 5:03 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 3:56 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 1:34 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 2:25 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 8:21 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 12:57 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Nov 12, 2011 at 11:35 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Sorry for my continued confusion here. This is numpy 1.6.1 on windows XP 32 bit. In [2]: np.finfo(np.float96).nmant Out[2]: 52 In [3]: np.finfo(np.float96).nexp Out[3]: 15 In [4]: np.finfo(np.float64).nmant Out[4]: 52 In [5]: np.finfo(np.float64).nexp Out[5]: 11 If there are 52 bits of precision, 2**53+1 should not be representable, and sure enough: In [6]: np.float96(2**53)+1 Out[6]: 9007199254740992.0 In [7]: np.float64(2**53)+1 Out[7]: 9007199254740992.0 If the nexp is right, the max should be higher for the float96 type: In [9]: np.finfo(np.float64).max Out[9]: 1.7976931348623157e+308 In [10]: np.finfo(np.float96).max Out[10]: 1.#INF I see that long double in C is 12 bytes wide, and double is the usual 8 bytes. Sorry - sizeof(long double) is 12 using mingw. I see that long double is the same as double in MS Visual C++. http://en.wikipedia.org/wiki/Long_double but, as expected from the name: In [11]: np.dtype(np.float96).itemsize Out[11]: 12 Hmm, good point. There should not be a float96 on Windows using the MSVC compiler, and the longdouble types 'gG' should return float64 and complex128 respectively. OTOH, I believe the mingw compiler has real float96 types but I wonder about library support. This is really a build issue and it would be good to have some feedback on what different platforms are doing so that we know if we are doing things right. Is it possible that numpy is getting confused by being compiled with mingw on top of a visual studio python? Some further forensics seem to suggest that, despite the fact the math suggests float96 is float64, the storage format it in fact 80-bit extended precision: Yes, extended precision is the type on Intel hardware with gcc, the 96/128 bits comes from alignment on 4 or 8 byte boundaries. With MSVC, double and long double are both ieee double, and on SPARC, long double is ieee quad precision. Right - but I think my researches are showing that the longdouble numbers are being _stored_ as 80 bit, but the math on those numbers is 64 bit. Is there a reason than numpy can't do 80-bit math on these guys? If there is, is there any point in having a float96 on windows? It's a compiler/architecture thing and depends on how the compiler interprets the long double c type. The gcc compiler does do 80 bit math on Intel/AMD hardware. MSVC doesn't, and probably never will. MSVC shouldn't produce float96 numbers, if it does, it is a bug. Mingw uses the gcc compiler, so the numbers are there, but if it uses the MS library it will have to convert them to double to do computations like sin(x) since there are no microsoft routines for extended precision. I suspect that gcc/ms combo is what is producing the odd results you are seeing. I think we might be talking past each other a bit. It seems to me that, if float96 must use float64 math, then it should be removed from the numpy namespace, because If we were to do so, it would break too much code. a) It implies higher precision than float64 but does not provide it b) It uses more memory to no obvious advantage There is an obvious advantage: to handle memory blocks which use long double, created outside numpy (or even python). Otherwise, while gcc indeed supports long double, the fact that the C runtime doesn't really mean it is hopeless to reach any kind of consistency. And I will reiterate what I said before about long double: if you care about your code behaving consistency across platforms, just forget about long double. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Odd-looking long double on windows 32 bit
On Tue, Nov 15, 2011 at 6:22 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Mon, Nov 14, 2011 at 10:08 PM, David Cournapeau courn...@gmail.com wrote: On Mon, Nov 14, 2011 at 9:01 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 5:03 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 3:56 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 1:34 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 2:25 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sun, Nov 13, 2011 at 8:21 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Nov 13, 2011 at 12:57 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Nov 12, 2011 at 11:35 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Sorry for my continued confusion here. This is numpy 1.6.1 on windows XP 32 bit. In [2]: np.finfo(np.float96).nmant Out[2]: 52 In [3]: np.finfo(np.float96).nexp Out[3]: 15 In [4]: np.finfo(np.float64).nmant Out[4]: 52 In [5]: np.finfo(np.float64).nexp Out[5]: 11 If there are 52 bits of precision, 2**53+1 should not be representable, and sure enough: In [6]: np.float96(2**53)+1 Out[6]: 9007199254740992.0 In [7]: np.float64(2**53)+1 Out[7]: 9007199254740992.0 If the nexp is right, the max should be higher for the float96 type: In [9]: np.finfo(np.float64).max Out[9]: 1.7976931348623157e+308 In [10]: np.finfo(np.float96).max Out[10]: 1.#INF I see that long double in C is 12 bytes wide, and double is the usual 8 bytes. Sorry - sizeof(long double) is 12 using mingw. I see that long double is the same as double in MS Visual C++. http://en.wikipedia.org/wiki/Long_double but, as expected from the name: In [11]: np.dtype(np.float96).itemsize Out[11]: 12 Hmm, good point. There should not be a float96 on Windows using the MSVC compiler, and the longdouble types 'gG' should return float64 and complex128 respectively. OTOH, I believe the mingw compiler has real float96 types but I wonder about library support. This is really a build issue and it would be good to have some feedback on what different platforms are doing so that we know if we are doing things right. Is it possible that numpy is getting confused by being compiled with mingw on top of a visual studio python? Some further forensics seem to suggest that, despite the fact the math suggests float96 is float64, the storage format it in fact 80-bit extended precision: Yes, extended precision is the type on Intel hardware with gcc, the 96/128 bits comes from alignment on 4 or 8 byte boundaries. With MSVC, double and long double are both ieee double, and on SPARC, long double is ieee quad precision. Right - but I think my researches are showing that the longdouble numbers are being _stored_ as 80 bit, but the math on those numbers is 64 bit. Is there a reason than numpy can't do 80-bit math on these guys? If there is, is there any point in having a float96 on windows? It's a compiler/architecture thing and depends on how the compiler interprets the long double c type. The gcc compiler does do 80 bit math on Intel/AMD hardware. MSVC doesn't, and probably never will. MSVC shouldn't produce float96 numbers, if it does, it is a bug. Mingw uses the gcc compiler, so the numbers are there, but if it uses the MS library it will have to convert them to double to do computations like sin(x) since there are no microsoft routines for extended precision. I suspect that gcc/ms combo is what is producing the odd results you are seeing. I think we might be talking past each other a bit. It seems to me that, if float96 must use float64 math, then it should be removed from the numpy namespace, because If we were to do so, it would break too much code. David - please - obviously I'm not suggesting removing it without deprecating it. Let's say I find it debatable that removing it (with all the deprecations) would be a good use of effort, especially given that there is no obviously better choice to be made. a) It implies higher precision than float64 but does not provide it b) It uses more memory to no obvious advantage There is an obvious advantage: to handle memory blocks which use long double, created outside numpy (or even python). Right - but that's a bit arcane, and I would have thought np.longdouble would be a good enough name for that. Of course, the users may be surprised, as I was, that memory allocated for higher precision is using float64, and that may take them some time to work out. I'll say again that 'longdouble' says to me 'something specific to the compiler' and 'float96' says
Re: [Numpy-discussion] failure to register ufunc loops for user defined types
On Sun, Dec 4, 2011 at 9:45 PM, Charles R Harris charlesr.har...@gmail.com wrote: We'll see how much interest there is. If it becomes official you may get more feedback on features. There are some advantages to having some user types in numpy. One is that otherwise they tend to get lost, another is that having a working example or two provides a templates for others to work from, and finally they provide test material. Because official user types aren't assigned anywhere there might also be some conflicts. Maybe something like an extension types module would be a way around that. In any case, I think both rational numbers and quaternions would be useful to have and I hope there is some discussion of how to do that. I agree that those will be useful, but I am worried about adding more stuff in multiarray. User-types should really be separated from multiarray. Ideally, they should be plugins but separated from multiarray would be a good first step. I realize it is a bit unfair to have this ready for Geoffray's code changes, but depending on the timelines for the 2.0.0 milestone, I think this would be a useful thing to have. Otherwise, if some ABI/API changes are needed after 2.0, we will be dragged down with this for years. I am willing to spend time on this. Geoffray, does this sound acceptable to you ? David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slow Numpy/MKL vs Matlab/MKL
On Tue, Dec 6, 2011 at 5:31 PM, Oleg Mikulya olegmi...@gmail.com wrote: Hi, How to make Numpy to match Matlab in term of performance ? I have tryied with different options, using different MKL libraries and ICC versions, still Numpy is below Matalb for certain basic tasks by ~2x. About 5 years ago I was able to get about same speed, not anymore. Matlab suppose to use same MKL, what it the reason of such Numpy slowness (beside one, yet fundamental, task) ? Have you checked that the returned values are the same (up to some precision) ? It may be that we don't use the same lapack underlying function, cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Moving to gcc 4.* for win32 installers ?
On Tue, Dec 13, 2011 at 3:43 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sun, Oct 30, 2011 at 12:18 PM, David Cournapeau courn...@gmail.com wrote: On Thu, Oct 27, 2011 at 5:19 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: Hi David, On Thu, Oct 27, 2011 at 3:02 PM, David Cournapeau courn...@gmail.com wrote: Hi, I was wondering if we could finally move to a more recent version of compilers for official win32 installers. This would of course concern the next release cycle, not the ones where beta/rc are already in progress. Basically, the pros: - we will have to move at some point - gcc 4.* seem less buggy, especially C++ and fortran. - no need to maintain msvcr90 vodoo The cons: - it will most likely break the ABI - we need to recompile atlas (but I can take care of it) - the biggest: it is difficult to combine gfortran with visual studio (more exactly you cannot link gfortran runtime to a visual studio executable). The only solution I could think of would be to recompile the gfortran runtime with Visual Studio, which for some reason does not sound very appealing :) To get the datetime changes to work with MinGW, we already concluded that building with 4.x is more or less required (without recognizing some of the points you list above). Changes to mingw32ccompiler to fix compilation with 4.x went in in https://github.com/numpy/numpy/pull/156. It would be good if you could check those. I will look into it more carefully, but overall, it seems that building atlas 3.8.4, numpy and scipy with gcc 4.x works quite well. The main issue is that gcc 4.* adds some dependencies on mingw dlls. There are two options: - adding the dlls in the installers - statically linking those, which seems to be a bad idea (generalizing the dll boundaries problem to exception and things we would rather not care about: http://cygwin.com/ml/cygwin/2007-06/msg00332.html). It probably makes sense make this move for numpy 1.7. If this breaks the ABI then it would be easiest to make numpy 1.7 the minimum required version for scipy 0.11. My thinking as well. Hi David, what is the current status of this issue? I kind of forgot this is a prerequisite for the next release when starting the 1.7.0 release thread. The only issue at this point is the distribution of mingw dlls. I have not found a way to do it nicely (where nicely means something that is distributed within numpy package). Given that those dlls are actually versioned and seem to have a strong versioning policy, maybe we can just install them inside the python installation ? cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Negative indexing.
On Sat, Jan 14, 2012 at 11:53 PM, Nathan Faggian nathan.fagg...@gmail.com wrote: Hi, I am finding it less than useful to have the negative index wrapping on nd-arrays. Here is a short example: import numpy as np a = np.zeros((3, 3)) a[:,2] = 1000 print a[0,-1] print a[0,-1] print a[-1,-1] In all cases 1000 is printed out. What I am after is a way to say please don't wrap around and have negative indices behave in a way I choose. I know this is a standard thing - but is there a way to override that behaviour that doesn't involve cython or rolling my own resampler? Although it could be possible with lots of work, it would most likely be a bad idea. You will need to wrap something around your model/data/etc... Could you explain a bit more what you have in mind ? David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Moving to gcc 4.* for win32 installers ?
On Sat, Feb 4, 2012 at 3:55 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Wed, Dec 14, 2011 at 6:50 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Wed, Dec 14, 2011 at 3:04 PM, David Cournapeau courn...@gmail.com wrote: On Tue, Dec 13, 2011 at 3:43 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sun, Oct 30, 2011 at 12:18 PM, David Cournapeau courn...@gmail.com wrote: On Thu, Oct 27, 2011 at 5:19 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: Hi David, On Thu, Oct 27, 2011 at 3:02 PM, David Cournapeau courn...@gmail.com wrote: Hi, I was wondering if we could finally move to a more recent version of compilers for official win32 installers. This would of course concern the next release cycle, not the ones where beta/rc are already in progress. Basically, the pros: - we will have to move at some point - gcc 4.* seem less buggy, especially C++ and fortran. - no need to maintain msvcr90 vodoo The cons: - it will most likely break the ABI - we need to recompile atlas (but I can take care of it) - the biggest: it is difficult to combine gfortran with visual studio (more exactly you cannot link gfortran runtime to a visual studio executable). The only solution I could think of would be to recompile the gfortran runtime with Visual Studio, which for some reason does not sound very appealing :) To get the datetime changes to work with MinGW, we already concluded that building with 4.x is more or less required (without recognizing some of the points you list above). Changes to mingw32ccompiler to fix compilation with 4.x went in in https://github.com/numpy/numpy/pull/156. It would be good if you could check those. I will look into it more carefully, but overall, it seems that building atlas 3.8.4, numpy and scipy with gcc 4.x works quite well. The main issue is that gcc 4.* adds some dependencies on mingw dlls. There are two options: - adding the dlls in the installers - statically linking those, which seems to be a bad idea (generalizing the dll boundaries problem to exception and things we would rather not care about: http://cygwin.com/ml/cygwin/2007-06/msg00332.html). It probably makes sense make this move for numpy 1.7. If this breaks the ABI then it would be easiest to make numpy 1.7 the minimum required version for scipy 0.11. My thinking as well. Hi David, what is the current status of this issue? I kind of forgot this is a prerequisite for the next release when starting the 1.7.0 release thread. The only issue at this point is the distribution of mingw dlls. I have not found a way to do it nicely (where nicely means something that is distributed within numpy package). Given that those dlls are actually versioned and seem to have a strong versioning policy, maybe we can just install them inside the python installation ? Although not ideal, I don't have a problem with that in principle. However, wouldn't it break installing without admin rights if Python is installed by the admin? David, do you have any more thoughts on this? Is there a final solution in sight? Anything I or anyone else can do to help? I have not found a way to do it without installing the dll alongside python libraries. That brings the problem of how to install libraries there from bdist_wininst/bdist_msi installers, which I had not the time to look at. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] fast method to to count a particular value in a large matrix
On Mon, Feb 6, 2012 at 1:17 AM, Wes McKinney wesmck...@gmail.com wrote: Whenever I get motivated enough I'm going to make a pull request on NumPy with something like khash.h and start fixing all the O(N log N) algorithms floating around that ought to be O(N). NumPy should really have a match function similar to R's and a lot of other things. khash.h is not the only thing that I'd like to use in numpy if I had more time :) David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Moving to gcc 4.* for win32 installers ?
On Tue, Feb 7, 2012 at 1:30 PM, Sturla Molden stu...@molden.no wrote: On 27.10.2011 15:02, David Cournapeau wrote: - we need to recompile atlas (but I can take care of it) - the biggest: it is difficult to combine gfortran with visual studio (more exactly you cannot link gfortran runtime to a visual studio executable). Why is that? I have used gfortran with Python on Windows a lot, never had a problem. How did you link a library with mixed C and gfortran ? It's not like we are going to share CRT resources between C/Python and Fortran. That would be silly, regardless of compiler. Well, that actually happens quite a bit in the libraries we depend on. One solution could actually be removing any dependency on the fortran runtime. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Moving to gcc 4.* for win32 installers ?
On Tue, Feb 7, 2012 at 1:55 PM, Sturla Molden stu...@molden.no wrote: On 07.02.2012 14:38, Sturla Molden wrote: May I suggest GotoBLAS2 instead of ATLAS? Or OpenBLAS, which is GotoBLAS2 except it is still maintained. I did not know GotoBLAS2 was open source (it wasn't last time I checked). That's very useful information, I will look into it. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On making Numpy 1.7 a long term support release.
On Sun, Feb 5, 2012 at 7:19 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sun, Feb 5, 2012 at 7:33 AM, Travis Oliphant tra...@continuum.io wrote: I think supporting Python 2.5 and above is completely fine. I'd even be in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for NumPy 2.8 +1 for dropping Python 2.5 support also for an LTS release. That will make it a lot easier to use str.format() and the with statement (plus many other things) going forward, without having to think about if your changes can be backported to that LTS release. At the risk of sounding like a broken record, I would really like to stay to 2.4, especially for a long term release :) This is still the basis used by a lots of long-term python products. If we can support 2.4 for a LTS, I would then be much more comfortable to allow bumping to 2.5 for 1.8. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On making Numpy 1.7 a long term support release.
On Sat, Feb 11, 2012 at 9:08 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Fri, Feb 10, 2012 at 8:51 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Fri, Feb 10, 2012 at 10:25 AM, David Cournapeau courn...@gmail.com wrote: On Sun, Feb 5, 2012 at 7:19 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sun, Feb 5, 2012 at 7:33 AM, Travis Oliphant tra...@continuum.io wrote: I think supporting Python 2.5 and above is completely fine. I'd even be in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for NumPy 2.8 +1 for dropping Python 2.5 support also for an LTS release. That will make it a lot easier to use str.format() and the with statement (plus many other things) going forward, without having to think about if your changes can be backported to that LTS release. At the risk of sounding like a broken record, I would really like to stay to 2.4, especially for a long term release :) This is still the basis used by a lots of long-term python products. If we can support 2.4 for a LTS, I would then be much more comfortable to allow bumping to 2.5 for 1.8. At the very least someone should step up to do the testing or maintain a buildbot for Python 2.4 then. Also for scipy, assuming that scipy keeps supporting the same Python versions as numpy. Here's a list of Python requirements for other important scientific python projects: - ipython: = 2.6 - matplotlib: v1.1 supports 2.4-2.7, v1.2 will support = 2.6 - scikit-learn: = 2.6 - scikit-image: = 2.5 - scikits.statsmodels: = 2.5 (next release probably = 2.6) That there are still some projects/products out there that still use Python 2.4 (some examples of such products would be nice by the way) is not enough of a reason by itself to continue to support it in new releases. There has to be a good reason for those products to require the very latest numpy, even though they are fine with a very old Python and older versions of almost any other Python package. I don't think that last argument is relevant for a LTS. Numpy is used in environments where you cannot easily control what's installed. RHEL still uses python 2.4 and will be supported until 2014 in the production phase. As for projects still using python 2.4, using the most downloaded packages from this list http://taichino.appspot.com/pypi_ranking/modules?page=1, most of them supported python 2.4 or below. lxml, zc.buildout, setuptools, pip, virtualenv and sqlalchemy do. Numpy itself is also used outside the strict scientific realm, which is why I am a bit warry about just comparing with other scientific python packages. Now, if everybody else is against it, I don't want to be a pain about it either :) David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] On making Numpy 1.7 a long term support release.
On Sat, Feb 11, 2012 at 1:30 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: As Bruce said, 29 Feb 2012 and not 2014: https://access.redhat.com/support/policy/updates/errata/ I think Bruce and me were not talking about the same RHEL version (4 vs 5). Let me see if I can set up a buildbot for 2.4. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Migrating issues to GitHub
On Sat, Feb 11, 2012 at 9:49 PM, Mark Wiebe mwwi...@gmail.com wrote: On Sat, Feb 11, 2012 at 3:12 PM, Eric Firing efir...@hawaii.edu wrote: On 02/11/2012 10:44 AM, Travis Oliphant wrote:snip 2) You must be an admin to label an issue (i.e. set it as a bug, enhancement, or so forth). A third problem is that the entire style of presentation is poorly designed from a use standpoint, in comparison to the sourceforge tracker which mpl used previously. The github tracker appears to have been designed by a graphics person, not a software maintainer. The information density in the issue list is very low; it is impossible to scan a large number of issues at once; there doesn't seem to be any useful sorting and selection mechanism. The lack of a tabular way to mass-edit bugs is one of my biggest problems with the current trac. One thing that ideally we could do regularly is to rapidly triage 100s of bugs. Currently trac requires you to go through them one by one, like harvesting wheat with a scythe instead of a combine. Users who are mentioned in a lot of tickets also get spammed by a large number of message, instead of getting a single summary update of all the triaging that was done. Does the github bug tracker have a good story about mass bug-updating? Github is better than trac on that issue: updating the milestone for many bugs at once is simple. You don't have priorities, etc…, though. The Rest API also enables in principle to write tools to automate the repetitive tasks. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Updated differences between 1.5.1 to 1.6.1
Hi Travis, It is great that some resources can be spent to have people paid to work on NumPy. Thank you for making that happen. I am slightly confused about roadmaps for numpy 1.8 and 2.0. This needs discussion on the ML, and our release manager currently is Ralf - he is the one who ultimately decides what goes when. I am also not completely comfortable by having a roadmap advertised to Pycon not coming from the community. regards, David On Tue, Feb 14, 2012 at 9:03 AM, Travis Oliphant tra...@continuum.io wrote: For reference, here is the table that shows the actual changes between 1.5.1 and 1.6.1 at least on 64-bit platforms in terms of type-casting. I updated the comparison code to throw out changes that are just spelling differences (i.e. where 1.6.1 chooses to create an output dtype with an 'L' character code instead of a 'Q' which on 64-bit system is effectively the same). Mostly I'm happy with the changes (after a cursory review). As I expected, there are some real improvements. Of course, I haven't looked at the changes that occur when the scalar being used does not fit in the range of the array data-type. I don't see this change documented in the link that Mark sent previously. Is it somewhere else? Also, it looks like previously object arrays were returned for some coercions which now simply fail. Is that an expected result? At this point, I'm not going to recommend changes to 1.7 to deal with these type-casting changes --- at least this thread will serve to show some of what changes occurred if it bites anyone in the future. However, I will have other changes to NumPy 1.X that I will be proposing and writing (and directing other people to write as well). After some period of quiet, this might be a refreshing change. But, not all may see it that way. I'm confident that we can resolve any concerns people might have. Any feature additions will preserve backward compatibility in NumPy 1.X. Mark W. will be helping with some of these changes, but mostly he will be working on NumPy 2.0 which we have tentatively targeted for next January. We have a tentative target for NumPy 1.8 in June/July. So far, there are three developers who will be working on NumPy 1.8 (me, Francesc Alted, and Bryan Van de Ven). Mark Wiebe is slated to help us, as well, but I would like to sponsor him as much as possible on the work for NumPy 2.0. If anyone else would like to join us, please let me know off-list. There is room for another talented person on our team. In addition to a few select features in NumPy 1.8 (a list of which will follow in a later email), we will also be working on reviewing the list of bugs on Trac and fixing them, writing tests, and improving docstrings. I would also like to improve the state of the bug-tracker and get in place a continuous integration system for NumPy. We will be advertising our NumPy 1.8 roadmap and our NumPy 2.0 roadmap at PyCon, and are working on documents that describe plans which we are hoping will be reviewed and discussed on this list. I know that having more people working on the code-base for several months will be a different scenario than what has transpired in the past. Hopefully, this will be a productive time for everybody and our sometimes different perspectives will be able to coalesce into a better result for more people. Best regards, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [cython-users] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3
On Mon, Feb 13, 2012 at 9:55 PM, Fernando Perez fperez@gmail.com wrote: Hi folks, [ I'm broadcasting this widely for maximum reach, but I'd appreciate it if replies can be kept to the *numpy* list, which is sort of the 'base' list for scientific/numerical work. It will make it much easier to organize a coherent set of notes later on. Apology if you're subscribed to all and get it 10 times. ] As part of the PyData workshop (http://pydataworkshop.eventbrite.com) to be held March 2 and 3 at the Mountain View Google offices, we have scheduled a session for an open discussion with Guido van Rossum and hopefully as many core python-dev members who can make it. We wanted to seize the combined opportunity of the PyData workshop bringing a number of 'scipy people' to Google with the timeline for Python 3.3, the first release after the Python language moratorium, being within sight: http://www.python.org/dev/peps/pep-0398. While a number of scientific Python packages are already available for Python 3 (either in released form or in their master git branches), it's fair to say that there hasn't been a major transition of the scientific community to Python3. Since there is no more development being done on the Python2 series, eventually we will all want to find ways to make this transition, and we think that this is an excellent time to engage the core python development team and consider ideas that would make Python3 generally a more appealing language for scientific work. Guido has made it clear that he doesn't speak for the day-to-day development of Python anymore, so we all should be aware that any ideas that come out of this panel will still need to be discussed with python-dev itself via standard mechanisms before anything is implemented. Nonetheless, the opportunity for a solid face-to-face dialog for brainstorming was too good to pass up. The purpose of this email is then to solicit, from all of our community, ideas for this discussion. In a week or so we'll need to summarize the main points brought up here and make a more concrete agenda out of it; I will also post a summary of the meeting afterwards here. Anything is a valid topic, some points just to get the conversation started: - Extra operators/PEP 225. Here's a summary from the last time we went over this, years ago at Scipy 2008: http://mail.scipy.org/pipermail/numpy-discussion/2008-October/038234.html, and the current status of the document we wrote about it is here: file:///home/fperez/www/site/_build/html/py4science/numpy-pep225/numpy-pep225.html. - Improved syntax/support for rationals or decimal literals? While Python now has both decimals (http://docs.python.org/library/decimal.html) and rationals (http://docs.python.org/library/fractions.html), they're quite clunky to use because they require full constructor calls. Guido has mentioned in previous discussions toying with ideas about support for different kinds of numeric literals... - Using the numpy docstring standard python-wide, and thus having python improve the pathetic state of the stdlib's docstrings? This is an area where our community is light years ahead of the standard library, but we'd all benefit from Python itself improving on this front. I'm toying with the idea of giving a lighting talk at PyConn about this, comparing the great, robust culture and tools of good docstrings across the Scipy ecosystem with the sad, sad state of docstrings in the stdlib. It might spur some movement on that front from the stdlib authors, esp. if the core python-dev team realizes the value and benefit it can bring (at relatively low cost, given how most of the information does exist, it's just in the wrong places). But more importantly for us, if there was truly a universal standard for high-quality docstrings across Python projects, building good documentation/help machinery would be a lot easier, as we'd know what to expect and search for (such as rendering them nicely in the ipython notebook, providing high-quality cross-project help search, etc). - Literal syntax for arrays? Sage has been floating a discussion about a literal matrix syntax (https://groups.google.com/forum/#!topic/sage-devel/mzwepqZBHnA). For something like this to go into python in any meaningful way there would have to be core multidimensional arrays in the language, but perhaps it's time to think about a piece of the numpy array itself into Python? This is one of the more 'out there' ideas, but after all, that's the point of a discussion like this, especially considering we'll have both Travis and Guido in one room. - Other syntactic sugar? Sage has a..b = range(a, b+1), which I actually think is both nice and useful... There's also the question of allowing a:b:c notation outside of [], which has come up a few times in conversation over the last few years. Others? - The packaging quagmire? This continues to be a problem,
Re: [Numpy-discussion] Numpy governance update
On Wed, Feb 15, 2012 at 10:30 PM, Peter Wang pw...@streamitive.com wrote: On Feb 15, 2012, at 3:36 PM, Matthew Brett wrote: Honestly - as I was saying to Alan and indirectly to Ben - any formal model - at all - is preferable to the current situation. Personally, I would say that making the founder of a company, which is working to make money from Numpy, the only decision maker on numpy - is - scary. How is this different from the situation of the last 4 years? Travis was President at Enthought, which makes money from not only Numpy but SciPy as well. In addition to employing Travis, Enthought also employees many other key contributors to Numpy and Scipy, like Robert and David. Furthermore, the Scipy and Numpy mailing lists and repos and web pages were all hosted at Enthought. If they didn't like how a particular discussion was going, they could have memory-holed the entire conversation from the archives, or worse yet, revoked commit access and reverted changes. I actually think it is somehow different. For once, while Travis was at Enthought, he contributed much less to the discussions (by his own account), so the risk of conflict of interest was not very high. My own contributions to numpy since I have joined Enthought are close to nil as well :) There have been cases of disagreements on NumPy: for any case where the decision taken by people from one company would prevail, you will not be able to prevent people from thinking the interests of the company prevailed. In numpy, where people make a suggestion and there was not enough review, the feature generally went in. This is fundamentally different from most open source projects I am aware of, and could go bad when considered with my previous point. As far as I am concerned, the following would be enough to resolve any issues: - having one (or more) persons outside any company interest (e.g. Chuck, Pauli) with a veto. - no significant feature goes in without a review from people outside the organization it is coming from. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposed Roadmap Overview
Hi Travis, On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant tra...@continuum.io wrote: Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: * addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. As a result, I am proposing a rough outline for releases over the next year: * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding filters to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures. This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposed Roadmap Overview
On Fri, Feb 17, 2012 at 3:39 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau courn...@gmail.com wrote: Hi Travis, On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant tra...@continuum.io wrote: Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: * addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. As a result, I am proposing a rough outline for releases over the next year: * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding filters to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures. This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining. I'm in favor of this myself, C++ would allow a lot code cleanup and make it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code that would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'. C++ will make integration with external environments much harder (calling a C++ library from a non C++ program is very hard, especially for cross-platform projects), and I am not convinced by the more extensible argument. Making the numpy C code buildable by a C++ compiler is harder than removing keywords. I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done. I don't know for matplotlib, but for scipy, quite a few issues were caused by our C++ extensions in scipy.sparse. But build issues are a not a strong argument against C++ - I am sure those could be worked out. regards, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposed Roadmap Overview
I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. I would much rather move most part to cython to solve subtle ref counting issues, typically. The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. Interestingly, the api from clang exported to other languages is in c... David Le 17 févr. 2012 18:21, Mark Wiebe mwwi...@gmail.com a écrit : On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing efir...@hawaii.edu wrote: On 02/17/2012 05:39 AM, Charles R Harris wrote: On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau courn...@gmail.com mailto:courn...@gmail.com wrote: Hi Travis, On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant tra...@continuum.io mailto:tra...@continuum.io wrote: Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: * addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. As a result, I am proposing a rough outline for releases over the next year: * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later.Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding filters to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures. This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining. I'm in favor of this myself, C++ would allow a lot code cleanup and make it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code that would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'. I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done. It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally
Re: [Numpy-discussion] Proposed Roadmap Overview
Le 17 févr. 2012 17:58, Mark Wiebe mwwi...@gmail.com a écrit : On Fri, Feb 17, 2012 at 10:27 AM, David Cournapeau courn...@gmail.com wrote: On Fri, Feb 17, 2012 at 3:39 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau courn...@gmail.com wrote: Hi Travis, On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant tra...@continuum.io wrote: Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: * addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. As a result, I am proposing a rough outline for releases over the next year: * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later.Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding filters to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures. This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining. I'm in favor of this myself, C++ would allow a lot code cleanup and make it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code that would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'. C++ will make integration with external environments much harder (calling a C++ library from a non C++ program is very hard, especially for cross-platform projects), and I am not convinced by the more extensible argument. The whole of NumPy could be written utilizing C++ extensively while still using exactly the same API and ABI numpy has now. C++ does not force anything about API/ABI design decisions. One good document to read about how a major open source project transitioned from C to C++ is about gcc. Their points comparing C and C++ apply to numpy quite well, and being compiler authors, they're intimately familiar with ABI and performance issues: http://gcc.gnu.org/wiki/gcc-in-cxx#The_gcc-in-cxx_branch Making the numpy C code buildable by a C++ compiler is harder than removing keywords. Certainly, but it's not a difficult task for someone who's familiar with both C and C++. I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done. I don't know for matplotlib, but for scipy, quite a few issues were caused by our C++ extensions in scipy.sparse. But build issues are a not a strong argument against C++ - I am sure those could be worked out. On this topic, I'd like to ask what it would take to change the default warning levels in all the build configurations? Building with no warnings under high warning
Re: [Numpy-discussion] Proposed Roadmap Overview
Le 18 févr. 2012 00:58, Charles R Harris charlesr.har...@gmail.com a écrit : On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau courn...@gmail.com wrote: I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor. There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve things while still be in C. You say that the compiler would take care of a lot of things: so far, the main thing that has been mentionned is raii. While it is certainly a useful concept, I find it ewtremely difficult to use correctly in real applications. Things that are simple to do on simple examples become really hard to deal with when features start to interact with each other (which is always in c++). Writing robust code that is exception safe with the stl requires a lot of knowledge. I don't have this knowledge. I have .o doubt Mark has this knowledge. Does anyone else on this list has ? I would much rather move most part to cython to solve subtle ref counting issues, typically. Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things. The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. Interestingly, the api from clang exported to other languages is in c... The api isn't the same as the implementation language. I wouldn't prejudge these issues, but some indication of how they would be solved might be helpful. I understand that api and inplementation language are not the same: you just quoted the part where I was mentioning it :) Assuming a c++ inplementation with a c api, how will you deal with templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ? david snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposed Roadmap Overview
Le 17 févr. 2012 18:21, Mark Wiebe mwwi...@gmail.com a écrit : On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing efir...@hawaii.edu wrote: On 02/17/2012 05:39 AM, Charles R Harris wrote: On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau courn...@gmail.com mailto:courn...@gmail.com wrote: Hi Travis, On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant tra...@continuum.io mailto:tra...@continuum.io wrote: Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: * addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. As a result, I am proposing a rough outline for releases over the next year: * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later.Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding filters to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures. This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining. I'm in favor of this myself, C++ would allow a lot code cleanup and make it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code that would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'. I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done. It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad. This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out there who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C. This is a really important issue, because accessibility is the essential reason why I am so strongly against
Re: [Numpy-discussion] Proposed Roadmap Overview
Le 18 févr. 2012 03:53, Charles R Harris charlesr.har...@gmail.com a écrit : On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau courn...@gmail.com wrote: Le 18 févr. 2012 00:58, Charles R Harris charlesr.har...@gmail.com a écrit : On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau courn...@gmail.com wrote: I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor. There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve things while still be in C. You say that the compiler would take care of a lot of things: so far, the main thing that has been mentionned is raii. While it is certainly a useful concept, I find it ewtremely difficult to use correctly in real applications. Things that are simple to do on simple examples become really hard to deal with when features start to interact with each other (which is always in c++). Writing robust code that is exception safe with the stl requires a lot of knowledge. I don't have this knowledge. I have .o doubt Mark has this knowledge. Does anyone else on this list has ? I have the sense you have written much in C++. Exception handling is maybe one of the weakest aspects of C, that is, it basically doesn't have any. The point is, I'd rather not *have* to worry much about the C/C++ side of things, and I think once a solid foundation is in place I won't have to nearly as much. Back in the late 80's I used rather nice Fortran and C++ compilers for writing code to run in extended DOS (the dos limit was 640 KB at that time). They were written in - wait for it - Pascal. The authors explained this seemingly odd decision by claiming that Pascal was better for bigger projects than C, and I agreed with them ;) Now you can point to Linux, which is 30 million + lines of C, but that is rather exceptional and the barriers to entry at this point are pretty darn high. My own experience is that beginners can seldom write more than a page of C and get it right, mostly because of pointers. Now C++ has a ton of subtleties and one needs to decide up front what parts to use and what not, but once a well designed system is in place, many things become easier because a lot of housekeeping is done for you. My own concern here is that the project is bigger than Mark thinks and he might get sucked off into a sideline, but I'd sure like to see the experiment made. I would much rather move most part to cython to solve subtle ref counting issues, typically. Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things. The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. Interestingly, the api from clang exported to other languages is in c... The api isn't the same as the implementation language. I wouldn't prejudge these issues, but some indication of how they would be solved might be helpful. I understand that api and inplementation language are not the same: you just quoted the part where I was mentioning it :) Assuming a c++ inplementation with a c api, how will you deal with templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ? None of these strike me as relevant, I mean, they are internals, not api problems, and shouldn't be visible to the user. How Mark would implement the C++ API, as opposed to the C API I don't know, but since both would be there I don't see the problem. But really, we need more details on how these things would work. I don't understand why you think this is not relevant ? If numpy is in c++, with a C API, most users of numpy C/C++ API will use the C API, at least at first, since most of them are in C. Changes of restrictions on how this API xan be used is visible. To be more concrete, if numpy is built by MS compiler, and an exception is thrown
Re: [Numpy-discussion] Proposed Roadmap Overview
Le 18 févr. 2012 04:37, Charles R Harris charlesr.har...@gmail.com a écrit : On Fri, Feb 17, 2012 at 9:18 PM, David Cournapeau courn...@gmail.com wrote: Le 18 févr. 2012 03:53, Charles R Harris charlesr.har...@gmail.com a écrit : On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau courn...@gmail.com wrote: Le 18 févr. 2012 00:58, Charles R Harris charlesr.har...@gmail.com a écrit : On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau courn...@gmail.com wrote: I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor. There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve things while still be in C. You say that the compiler would take care of a lot of things: so far, the main thing that has been mentionned is raii. While it is certainly a useful concept, I find it ewtremely difficult to use correctly in real applications. Things that are simple to do on simple examples become really hard to deal with when features start to interact with each other (which is always in c++). Writing robust code that is exception safe with the stl requires a lot of knowledge. I don't have this knowledge. I have .o doubt Mark has this knowledge. Does anyone else on this list has ? I have the sense you have written much in C++. Exception handling is maybe one of the weakest aspects of C, that is, it basically doesn't have any. The point is, I'd rather not *have* to worry much about the C/C++ side of things, and I think once a solid foundation is in place I won't have to nearly as much. Back in the late 80's I used rather nice Fortran and C++ compilers for writing code to run in extended DOS (the dos limit was 640 KB at that time). They were written in - wait for it - Pascal. The authors explained this seemingly odd decision by claiming that Pascal was better for bigger projects than C, and I agreed with them ;) Now you can point to Linux, which is 30 million + lines of C, but that is rather exceptional and the barriers to entry at this point are pretty darn high. My own experience is that beginners can seldom write more than a page of C and get it right, mostly because of pointers. Now C++ has a ton of subtleties and one needs to decide up front what parts to use and what not, but once a well designed system is in place, many things become easier because a lot of housekeeping is done for you. My own concern here is that the project is bigger than Mark thinks and he might get sucked off into a sideline, but I'd sure like to see the experiment made. I would much rather move most part to cython to solve subtle ref counting issues, typically. Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things. The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. Interestingly, the api from clang exported to other languages is in c... The api isn't the same as the implementation language. I wouldn't prejudge these issues, but some indication of how they would be solved might be helpful. I understand that api and inplementation language are not the same: you just quoted the part where I was mentioning it :) Assuming a c++ inplementation with a c api, how will you deal with templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ? None of these strike me as relevant, I mean, they are internals, not api problems, and shouldn't be visible to the user. How Mark would implement the C++ API, as opposed to the C API I don't know, but since both would be there I don't see the problem. But really, we need more details on how these things would work. I don't understand why you think this is not relevant ? If numpy is in c++, with a C API, most users of numpy C/C++ API
Re: [Numpy-discussion] Proposed Roadmap Overview
Le 18 févr. 2012 06:18, Christopher Jordan-Squire cjord...@uw.edu a écrit : On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden stu...@molden.no wrote: Den 18. feb. 2012 kl. 05:01 skrev Jason Grout jason-s...@creativetrax.com: On 2/17/12 9:54 PM, Sturla Molden wrote: We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge. I personally would love such a thing. It's been a while since I did anything nontrivial on my own in C++. One example: How do we code multiple return values? In Python: - Return a tuple. In C: - Use pointers (evilness) In C++: - Return a std::tuple, as you would in Python. - Use references, as you would in Fortran or Pascal. - Use pointers, as you would in C. C++ textbooks always pick the last... I would show the first and the second method, and perhaps intentionally forget the last. Sturla I can add my own 2 cents about cython vs. C vs. C++, based on summer coding experiences. I was an intern at Enthought, sharing an office with Mark W. (Which was a treat. I recommend you all quit your day jobs and haunt whatever office Mark is inhabiting.) I was trying to optimize some code and that lead to experimenting with both cython and C. Dealing with the C internals of numpy was frustrating. Since C doesn't have templating but numpy kinda needs it, instead python scripts go over and manually perform templating. Not the most obvious thing. There were other issues in the background--including C doesn't allow for abstraction (i.e. easy to read), lots of pointer-fu is required, and the C API is lightly documented and already plenty difficult. Please understand that the argument is not to maintain a status quo. Lack of API documentation, internals that need significant work are certainly issues. I fail to see how writing in C++ will solve the documentation issues. On the abstraction side of things, let's agree to disagree. Plenty of complex projects are written in both languages to make this a matter of mostly subjective matter. On the flip side, cython looked pretty...but I didn't get the performance gains I wanted, and had to spend a lot of time figuring out if it was cython, needing to add types, buggy support for numpy, or actually the algorithm. The C files generated by cython were enormous and difficult to read. They really weren't meant for human consumption. As Sturla has said, regardless of the quality of the current product, it isn't stable. Sturla represents only himself on this issue. Cython is widely held as a successful and very useful tool. Many more projects in the scipy community uses cython compared to C++. And even if it looks friendly there's magic going on under the hood. Magic means it's hard to diagnose and fix problems. At least one very smart person has told me they find cython most useful for wrapping C/C++ libraries and exposing them to python, which is a far cry from library writing. (Of course Wes McKinney, a cython evangelist, uses it all over his pandas library.) I am not very smart, but this is certainly close to what I had in mind as well :) As you know, the lack of clear abstraction between c and c python wrapping is one of the major issue in numpy. Cython is certainly one of the most capable tool out there to avoid tedious reference bug chasing. In comparison, there are a number of high quality, performant, open-source C++ based array libraries out there with very friendly API's. Things like eigen (http://eigen.tuxfamily.org/index.php?title=Main_Page) and Armadillo (http://arma.sourceforge.net/). They seem to have plenty of users and more devs than eigen is a typical example of code i hope numpy will never be close to. This is again quite subjective, but it also shows that we have quite different ideas on what maintainable/readable code means. Which is of course quite alright. But it means a choice needs to be made. If a majority of people find eigen more readable than a well written C library, then I don't think anyone can reasonably argue against going to c++. On the broader topic of recruitment...sure, cython has a lower barrier to entry than C++. But there are many, many more C++ developers and resources out there than cython resources. And it likely will stay that way for quite some I may not have explained it very well: my whole point is that we don't recruite people, where I understand recruit as hiring full time, profesional programmers.We need more people who can casually spend a few hours - typically grad students, scientists with an itch. There is no doubt that more professional programmers know c++ compared to C. But a community project like numpy has different requirements than a professional project. David -Chris ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org
Re: [Numpy-discussion] Proposed Roadmap Overview
Le 18 févr. 2012 11:25, Robert Kern robert.k...@gmail.com a écrit : On Sat, Feb 18, 2012 at 04:54, Charles R Harris charlesr.har...@gmail.com wrote: I found this , which references 0mq (used by ipython) as an example of a C++ library with a C interface. It seems enums can have different sizes in C/C++, so that is something to watch. One of the ways they manage to do this is by scrupulously avoiding exceptions even in the internal, never-touches-C zone. I took a superficial look at zeromq 2.x sources: it looks like they don't use much of the stl (beyond vector and some trivial usages of algorithm). I wonder if this is linked ? FWIW, I would be fine with using such a subset in numpy. David -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposed Roadmap Overview
On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi. On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire cjord...@uw.edu wrote: On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire cjord...@uw.edu wrote: On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden stu...@molden.no wrote: Den 18. feb. 2012 kl. 05:01 skrev Jason Grout jason-s...@creativetrax.com: On 2/17/12 9:54 PM, Sturla Molden wrote: We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge. I personally would love such a thing. It's been a while since I did anything nontrivial on my own in C++. One example: How do we code multiple return values? In Python: - Return a tuple. In C: - Use pointers (evilness) In C++: - Return a std::tuple, as you would in Python. - Use references, as you would in Fortran or Pascal. - Use pointers, as you would in C. C++ textbooks always pick the last... I would show the first and the second method, and perhaps intentionally forget the last. Sturla On the flip side, cython looked pretty...but I didn't get the performance gains I wanted, and had to spend a lot of time figuring out if it was cython, needing to add types, buggy support for numpy, or actually the algorithm. At the time, was the numpy support buggy? I personally haven't had many problems with Cython and numpy. It's not that the support WAS buggy, it's that it wasn't clear to me what was going on and where my performance bottleneck was. Even after microbenchmarking with ipython, using timeit and prun, and using the cython code visualization tool. Ultimately I don't think it was cython, so perhaps my comment was a bit unfair. But it was unfortunately difficult to verify that. Of course, as you say, diagnosing and solving such issues would become easier to resolve with more cython experience. The C files generated by cython were enormous and difficult to read. They really weren't meant for human consumption. Yes, it takes some practice to get used to what Cython will do, and how to optimize the output. As Sturla has said, regardless of the quality of the current product, it isn't stable. I've personally found it more or less rock solid. Could you say what you mean by it isn't stable? I just meant what Sturla said, nothing more: Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler. Y'all mean, it has a zero at the beginning of the version number and it is still adding new features? Yes, that is correct, but it seems more reasonable to me to phrase that as 'active development' rather than 'unstable', because they take considerable care to be backwards compatible, have a large automated Cython test suite, and a major stress-tester in the Sage test suite. Matthew, No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way. I believe the proposal is to refactor the lowest levels in pure C and move the some or most of the library superstructure to Cython. Go for it. The proposal of moving to a core C + cython has been discussed by multiple contributors. It is certainly a valid proposal. *I* have worked on this (npymath, separate compilation), although certainly not as much as I would have wanted to. I think much can be done in that vein. Using the shut up if you don't do it is a straw man (and uncalled for). Moving away from subjective considerations on how to do things, is there a way that one can see the pros/cons of each approach. For the C++ approach, I would really like to see which C++ is being considered. I was. Once the choice is done, going back would be quite hard, so I can't see how we could go for it just because some people prefer it without very clear technical arguments. Saying that C++ is more readable, or scale better are frankly very weak and too subjective to be convincing. There are too many projects way more complex than numpy that have been done in either C or C++. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org
Re: [Numpy-discussion] Proposed Roadmap Overview
On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris charlesr.har...@gmail.com wrote: Well, we already have code obfuscation (DOUBLE_your_pleasure, FLOAT_your_boat), so we might as well let the compiler handle it. Yes, those are not great, but on the other hand, it is not that a fundamental issue IMO. Iterators as we have it in NumPy is something that is clearly limited by C. Writing the neighborhood iterator is the only case where I really felt that C++ *could* be a significant improvement. I use *could* because writing iterator in C++ is hard, and will be much harder to read (I find both boost and STL - e.g. stlport -- iterators to be close to write-only code). But there is the question on how you can make C++-based iterators available in C. I would be interested in a simple example of how this could be done, ignoring all the other issues (portability, exception, etc…). The STL is also potentially compelling, but that's where we go into my beware of the dragons area of C++. Portability loss, compilation time increase and warts are significant there. scipy.sparse.sparsetools has been a source of issues that was quite high compared to its proportion of scipy amount code (we *do* have some hard-won experience on C++-related issues). Jim Hugunin was a keynote speaker at one of the scipy conventions. At dinner he said that if he was to do it again he would use managed code ;) I don't propose we do that, but tools do advance. In an ideal world, we would have a better language than C++ that can be spit out as C for portability. I have looked for a way to do this for as long as I have been contributing to NumPy (I have looked at ooc, D, coccinelle at various stages). I believe the best way is actually in the vein of FFTW: written in a very high level language (OCAML) for the hard part, and spitting out C. This is better than C++ is many ways - this is also clearly not realistic :) David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposed Roadmap Overview
On Sat, Feb 18, 2012 at 10:50 PM, Sturla Molden stu...@molden.no wrote: In an ideal world, we would have a better language than C++ that can be spit out as C for portability. What about a statically typed Python? (That is, not Cython.) We just need to make the compiler :-) There are better languages than C++ that has most of the technical benefits stated in this discussion (rust and D being the most obvious ones), but whose usage is unrealistic today for various reasons: knowledge, availability on esoteric platforms, etc… A new language is completely ridiculous. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
Hi Mark, thank you for joining this discussion. On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe mwwi...@gmail.com wrote: The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: The C subset of C++ is just as efficient as C. C++ supports cleaner code in several significant cases. C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. C++ never requires uglier code. I think those arguments will not be very useful: they are subjective, and unlikely to convince people who prefer C to C++. There are concerns about ABI/API interoperability and interactions with C++ exceptions. I've dealt with these types of issues on enough platforms to know that while they're important, they're a lot easier to handle than the issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that providing a C API from a C++ library is no harder than providing a C API from a C library. This needs more details. I have some experience in both areas as well, and mine is quite different. Reiterating a few examples that worry me: - how can you ensure that exceptions happening in C++ will never cross different .so/.dll ? How can one make sure C++ extensions built by different compilers can work ? Is not using exceptions like it is done in zeromq acceptable ? (would be nice to find out more about the decisions made by the zeromq team about their usage of C++). I cannot find a recent example, but I have seen errors similar to this(http://software.intel.com/en-us/forums/showthread.php?t=42940) quite a few times. - how can you expose in C some heavily-using C++ features ? I would expect you would like to use templates for iterators in numpy - you can you make them available to 3rd party extensions without requiring C++. It's worth comparing the possibility of C++ versus the possibility of other languages, and the ones that have been suggested for consideration are D, Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language has to interact naturally with the CPython API. It needs to provide direct access to all the various sizes of signed int, unsigned int, and float. It needs to have mature compiler support wherever we want to deploy NumPy. Taken together, these requirements eliminate a majority of these possibilities. From these criteria, the only languages which seem to have a clear possibility for the implementation of Numpy are C, C++, and D. For D, I suspect the tooling is not mature enough, but I'm not 100% certain of that. While I agree that no other language is realistic, staying in C has the nice advantage that we can more easily use one of them if they mature (rust/D - go, rpython, C#/java can be dismissed for fundamental technical reasons right away). This is not a very strong argument against using C++, obviously. 1) Immediately after branching for 1.7, we minimally patch all the .c files so that they can build with a C++ compiler and with a C compiler at the same time. Then we rename all .c - .cpp, and update the build systems for C++. 2) During the 1.8 development cycle, we heavily restrict C++ feature usage. But, where a feature implementation would be arguably easier and less error-prone with C++, we allow it. This is a period for learning about C++ and how it can benefit NumPy. 3) After the 1.8 release, the community will have developed more experience with C++, and will be in a better position to discuss a way forward. A step that would be useful sooner rather than later is one where numpy has been split into smaller extensions (instead of multiarray/ufunc, essentially). This would help avoiding recompilation of lots of code for any small change. It is already quite painful with C, but with C++, it will be unbearable. This can be done in C, and would be useful whether the decision to move to C++ is accepted or not. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposed Roadmap Overview
On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe mwwi...@gmail.com wrote: On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau courn...@gmail.com wrote: On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris charlesr.har...@gmail.com wrote: Well, we already have code obfuscation (DOUBLE_your_pleasure, FLOAT_your_boat), so we might as well let the compiler handle it. Yes, those are not great, but on the other hand, it is not that a fundamental issue IMO. Iterators as we have it in NumPy is something that is clearly limited by C. Writing the neighborhood iterator is the only case where I really felt that C++ *could* be a significant improvement. I use *could* because writing iterator in C++ is hard, and will be much harder to read (I find both boost and STL - e.g. stlport -- iterators to be close to write-only code). But there is the question on how you can make C++-based iterators available in C. I would be interested in a simple example of how this could be done, ignoring all the other issues (portability, exception, etc…). The STL is also potentially compelling, but that's where we go into my beware of the dragons area of C++. Portability loss, compilation time increase and warts are significant there. scipy.sparse.sparsetools has been a source of issues that was quite high compared to its proportion of scipy amount code (we *do* have some hard-won experience on C++-related issues). These standard library issues were definitely valid 10 years ago, but all the major C++ compilers have great C++98 support now. STL varies significantly between platforms, I believe it is still the case today. Do you know the status of the STL on bluegen, on small devices ? We unfortunately cannot restrict ourselves to one well known implementation (e.g. STLPort). Is there a specific target platform/compiler combination you're thinking of where we can do tests on this? I don't believe the compile times are as bad as many people suspect, can you give some simple examples of things we might do in NumPy you expect to compile slower in C++ vs C? Switching from gcc to g++ on the same codebase should not change much compilation times. We should test, but that's not what worries me. What worries me is when we start using C++ specific code, STL and co. Today, scipy.sparse.sparsetools takes half of the build time of the whole scipy, and it does not even use fancy features. It also takes Gb of ram when building in parallel. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 9:19 AM, Mark Wiebe mwwi...@gmail.com wrote: On Sun, Feb 19, 2012 at 2:56 AM, David Cournapeau courn...@gmail.com wrote: Hi Mark, thank you for joining this discussion. On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe mwwi...@gmail.com wrote: The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: The C subset of C++ is just as efficient as C. C++ supports cleaner code in several significant cases. C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. C++ never requires uglier code. I think those arguments will not be very useful: they are subjective, and unlikely to convince people who prefer C to C++. They are arguments from a team which implement both a C and a C++ compiler. In the spectrum of possible authorities on the matter, they rate about as high as I can imagine. There are quite a few arguments who are as authoritative and think those arguments are not very strong. They are as unlikely to change your mind as the gcc's arguments are unlikely to convince me I am afraid. This is a necessary part of providing a C API, and is included as a requirement of doing that. All C++ libraries which expose a C API deal with this. The only two given examples given so far for a C library around C++ code (clang and zeromq) do not use exceptions. Can you provide an example of a C++ library that has a C API and does use exception ? If not, I would like to know the technical details if you don't mind expanding on them. How can one make sure C++ extensions built by different compilers can work ? This is no different from the situation in C. Already in C on Windows, one can't build NumPy with a different version of Visual C++ than the one used to build CPython. This is a different situation. On windows, the mismatch between VS is due to the way win32 has been used by python itself - it could actually be fixed eventually by python (there are efforts in that regard). It is not a language issue. Except for that case, numpy has a pretty good record of allowing people to mix and match compilers. Using mingw on windows and intel compilers on linux are the typical cases, but not the only ones. I would expect you would like to use templates for iterators in numpy - you can you make them available to 3rd party extensions without requiring C++. Yes, something like the nditer is a good example. From C, it would have to retain an API in the current style, but C++ users could gain an easier-to-use variant. Providing an official C++ library on top of the current C API would certainly be nice for people who prefer C++ to C. But this is quite different from using C++ at the core. The current way iterators work would be very hard (if at all possible ?) to rewrite in idiomatic in C++ while keeping even API compatibility with the existing C one. For numpy 2.0, we can somehow relax on this. If it is not too time consuming, could you show a simplified example of how it would work to write the iterator in C++ while providing a C API in the spirit of what we have now ? David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposed Roadmap Overview
On Sun, Feb 19, 2012 at 9:28 AM, Mark Wiebe mwwi...@gmail.com wrote: Is there anyone who uses a blue gene or small device which needs up-to-date numpy support, that I could talk to directly? We really need a list of supported platforms on the numpy wiki we can refer to when discussing this stuff, it all seems very nebulous to me. They may not need an up to date numpy version now, but if stopping support for them is a requirement for C++, it must be kept in mind. I actually suspect Travis to have more details on the big iron side of things. On the small side of things: http://projects.scipy.org/numpy/ticket/1969 This may seem like not very useful - but that's part of what a open source project is all about in my mind. Particular styles of using templates can cause this, yes. To properly do this kind of advanced C++ library work, it's important to think about the big-O notation behavior of your template instantiations, not just the big-O notation of run-time. C++ templates have a turing-complete language (which is said to be quite similar to haskell, but spelled vastly different) running at compile time in them. This is what gives template meta-programming in C++ great power, but since templates weren't designed for this style of programming originally, template meta-programming is not very easy. scipy.sparse.sparsetools is quite straightforward in its usage of templates (would be great if you could suggest improvement BTW, e.g. scipy/sparse/sparsetools/csr.h), and does not by itself use any meta-template programming. I like that numpy can be built in a few seconds (at least without optimization), and consider this to be a useful feature. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe mwwi...@gmail.com wrote: On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh ben_w_...@yahoo.co.uk wrote: Date: Sun, 19 Feb 2012 01:18:20 -0600 From: Mark Wiebe mwwi...@gmail.com Subject: [Numpy-discussion] How a transition to C++ could work To: Discussion of Numerical Python NumPy-Discussion@scipy.org Message-ID: CAMRnEmpVTmt=kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com Content-Type: text/plain; charset=utf-8 The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: - The C subset of C++ is just as efficient as C. - C++ supports cleaner code in several significant cases. - C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. - C++ never requires uglier code. I think they're trying to solve a different problem. I thought the problem that numpy was trying to solve is make inner loops of numerical algorithms very fast. C is great for this because you can write C code and picture precisely what assembly code will be generated. What you're describing is also the C subset of C++, so your experience applies just as well to C++! C++ removes some of this advantage -- now there is extra code generated by the compiler to handle constructors, destructors, operators etc which can make a material difference to fast inner loops. So you end up just writing C-style anyway. This is in fact not true, and writing in C++ style can often produce faster code. A classic example of this is C qsort vs C++ std::sort. You may be thinking of using virtual functions in a class hierarchy, where a tradeoff between performance and run-time polymorphism is being done. Emulating the functionality that virtual functions provide in C will give similar performance characteristics as the C++ language feature itself. On the other hand, if your problem really is write lots of OO code with virtual methods and have it turned into machine code (probably like the GCC guys) then maybe C++ is the way to go. Managing the complexity of the dtype subsystem, the ufunc subsystem, the nditer component, and other parts of NumPy could benefit from C++ Not in a stereotypical OO code with virtual methods way, that is not how typical modern C++ is done. Some more opinions on C++: http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/ Sorry if this all seems a bit negative about C++. It's just been my experience that C++ adds complexity while C keeps things nice and simple. Yes, there are lots of negative opinions about C++ out there, it's true. Just like there are negative opinions about C, Java, C#, and any other language which has become popular. My experience with regard to complexity and C vs C++ is that C forces the complexity of dealing with resource lifetimes out into all the code everyone writes, while C++ allows one to encapsulate that sort of complexity into a class which is small and more easily verifiable. This is about code quality, and the best quality C++ code I've worked with has been way easier to program in than the best quality C code I've worked with. While I actually believe this to be true (very good C++ can be easier to read/use than very good C). Good C is also much more common than good C++, at least in open source. On the good C++ codebases you have been working on, could you rely on everybody being a very good C++ programmer ? Because this will most likely never happen for numpy. This is the crux of the argument from an organizational POV: the variance in C++ code quality is much more difficult to control. I have seen C++ code that is certainly much poorer and more complex than numpy, to a point where not much could be done to save the codebase. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [Numpy] quadruple precision
On Wed, Feb 29, 2012 at 10:22 AM, Paweł Biernat pw...@wp.pl wrote: I am completely new to Numpy and I know only the basics of Python, to this point I was using Fortran 03/08 to write numerical code. However, I am starting a new large project of mine and I am looking forward to using Python to call some low level Fortran code responsible for most of the intensive number crunching. In this context I stumbled into f2py and it looks just like what I need, but before I start writing an app in mixture of Python and Fortran I have a question about numerical precision of variables used in numpy and f2py. Is there any way to interact with Fortran's real(16) (supported by gcc and Intel's ifort) data type from numpy? By real(16) I mean the binary128 type as in IEEE 754. (In C this data type is experimentally supported as __float128 (gcc) and _Quad (Intel's icc).) I have investigated the float128 data type, but it seems to work as binary64 or binary80 depending on the architecture. If there is currently no way to interact with binary128, how hard would it be to patch the sources of numpy to add such data type? I am interested only in basic stuff, comparable in functionality to libmath. As said before, I have little knowledge of Python, Numpy and f2py, I am however, interested in investing some time in learing it and implementing the mentioned features, but only if there is any hope of succeeding. Numpy does not have proper support for the quadruple precision float numbers, because very few implementation do (no common CPU handle it in hw, for example). The dtype128 is a bit confusingly named: the 128 refers to the padding in memory, but not its real precision. It often (but not always) refer to the long double in the underlying C implementation. The latter depends on the OS, CPU and compilers. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] C++ Example
On Sat, Mar 3, 2012 at 8:07 AM, Luis Pedro Coelho l...@cmu.edu wrote: Hi, I sort of missed the big C++ discussion, but I'd like to give some examples of how writing code can become much simpler if you are based on C++. This is from my mahotas package, which has a thin C++ wrapper around numpy's C API https://github.com/luispedro/mahotas/blob/master/mahotas/_morph.cpp and it implements multi-type greyscale erosion. // numpy::aligned_array wraps PyArrayObject* templatetypename T void erode(numpy::aligned_arrayT res, numpy::aligned_arrayT array, numpy::aligned_arrayT Bc) { // Release the GIL using RAII gil_release nogil; const int N = res.size(); typename numpy::aligned_arrayT::iterator iter = array.begin(); // this is adapted from scipy.ndimage. // it implements the convolution-like filtering. filter_iteratorT filter(res.raw_array(), Bc.raw_array(), EXTEND_NEAREST, is_bool(T())); const int N2 = filter.size(); T* rpos = res.data(); for (int i = 0; i != N; ++i, ++rpos, filter.iterate_both(iter)) { T value = std::numeric_limitsT::max(); for (int j = 0; j != N2; ++j) { T arr_val = T(); filter.retrieve(iter, j, arr_val); value = std::minT(value, erode_sub(arr_val, filter[j])); } *rpos = value; } } If you compare this with the equivalent scipy.ndimage function, which is very good C code (but mostly write-only—in fact, ndimage has not been maintainable because it is so hard [at least for me, I've tried]): The fact that this is good C is matter of opinon :) I don't think the code is comparable either - some of the stuff done in the C code is done in the C++ code your are calling. The C code could be significantly improved. Even more important here: almost none of this code should be written anymore anyway, C++ or not. This is really the kind of code that should be done in cython, as it is mostly about wrapping C code into the python C API. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] dtype comparison, hash
On Tue, Jan 17, 2012 at 9:28 AM, Robert Kern robert.k...@gmail.com wrote: On Tue, Jan 17, 2012 at 05:11, Andreas Kloeckner li...@informa.tiker.net wrote: Hi Robert, On Fri, 30 Dec 2011 20:05:14 +, Robert Kern robert.k...@gmail.com wrote: On Fri, Dec 30, 2011 at 18:57, Andreas Kloeckner li...@informa.tiker.net wrote: Hi Robert, On Tue, 27 Dec 2011 10:17:41 +, Robert Kern robert.k...@gmail.com wrote: On Tue, Dec 27, 2011 at 01:22, Andreas Kloeckner li...@informa.tiker.net wrote: Hi all, Two questions: - Are dtypes supposed to be comparable (i.e. implement '==', '!=')? Yes. - Are dtypes supposed to be hashable? Yes, with caveats. Strictly speaking, we violate the condition that objects that equal each other should hash equal since we define == to be rather free. Namely, np.dtype(x) == x for all objects x that can be converted to a dtype. np.dtype(float) == np.dtype('float') np.dtype(float) == float np.dtype(float) == 'float' Since hash(float) != hash('float') we cannot implement np.dtype.__hash__() to follow the stricture that objects that compare equal should hash equal. However, if you restrict the domain of objects to just dtypes (i.e. only consider dicts that use only actual dtype objects as keys instead of arbitrary mixtures of objects), then the stricture is obeyed. This is a useful domain that is used internally in numpy. Is this the problem that you found? Thanks for the reply. It doesn't seem like this is our issue--instead, we're encountering two different dtype objects that claim to be float64, compare as equal, but don't hash to the same value. I've asked the user who encountered the user to investigate, and I'll be back with more detail in a bit. I think we've run into this before and tried to fix it. Try to find the version of numpy the user has and a minimal example, if you can. This is what Thomas found: http://projects.scipy.org/numpy/ticket/2017 It looks like the .flags attribute is different between np.uintp and np.uint32. The .flags attribute forms part of the hashed information about the dtype (or PyArray_Descr at the C-level). [~] |15 np.dtype(np.uintp).flags 1536 [~] |16 np.dtype(np.uint32).flags 2048 The same goes for np.intp and np.int32 in numpy 1.6.1 on OS X, so unlike the comment in the ticket, they do have different hashes for me. However, diving through the source a bit, I'm not entirely sure I trust the values being given at the Python level. It appears that the flag member of the PyArray_Descr struct is declared as a char. However, it is exposed as a T_INT member in the PyMemberDef table by direct addressing. Basically, a Python descriptor gets added to the np.dtype type that will look up sizeof(long) bytes from the starting position of the flags member in the struct. This includes 3 bytes of the following type_num member. Obviously, 2048 does not fit into a char. Nonetheless, the type_num is also part of the hash, so either the flags member or the type_num member is different between the two. Two bugs for the price of one! Good catch ! So basically, the flag was changed from a char to an int back to a char, and some of the code did not follow. I could not really follow the exact history from the log alone, but basically: - there is indeed a char vs int discrepency (T_INT vs char) - in most dtype functions handling the flag variable, temporary computation were made with an int (but every possible flag combination can fit in a char) - quite a few usage of i instead of c in PyArg_ParseTuple and PyBuild_Value. Even after all those things, the original bug is there, because uintp and uin32 have different typenum, even in 32 bits. I would actually consider this a big in PyArray_EquivTypes, but changing this now may be quite disrupting. Shall I remove type_num from the hash input (in which case the bug would be fixed) ? David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fixing PyArray_Descr flags member size, ABI vs pickling issue
On Tue, Mar 6, 2012 at 6:20 AM, Robert Kern robert.k...@gmail.com wrote: On Tue, Mar 6, 2012 at 03:53, David Cournapeau courn...@gmail.com wrote: Hi, This is following the discussion on bug http://projects.scipy.org/numpy/ticket/2017 Essentially, there is a discrepency between the actual type of the flags member in the dtype C structure (char) and how it is declared in the descriptor table (T_INT). The problem is that we are damned if we fix it, damned if we are not: - fixing T_INT to T_BYTE. flag in python is now fixed, but it breaks pickled numpy arrays Is the problem that T_BYTE returns a single-item string? Yes (although it is actually what we want, instead of an int). My handwrapping skills are rusty (Cython has blissfully eradicated this from my memory), but aren't these T_INT/T_BYTE things just convenient shortcuts for exposing C struct members as Python attributes? Couldn't we just write a full getter function for returning the correct value, just returned as a Python int instead of a str. You're right, I did not think about this solution. That's certainly better than the two I suggested. Thanks, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fixing PyArray_Descr flags member size, ABI vs pickling issue
On Tue, Mar 6, 2012 at 1:25 PM, Travis Oliphant tra...@continuum.io wrote: Why do we want to return a single string char instead of an int? There is a need for more flags on the dtype object. Using an actual attribute call seems like the way to go. This could even merge the contents of two struct members so that we can add more flags but preserve ABI compatibility. Yes. The T_BYTE/T_INT is actually pretty minor compared to the underlying issue (where we cast back and forth between int and char). I will make a new PR that fixes everything but this exact point, and will put an actual accessor if needed. Given that dtype.flags is nonsensical as of today (at the python level), I would expect nobody uses it. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] (2012) Accessing LAPACK and BLAS from the numpy C API
On Tue, Mar 6, 2012 at 2:57 PM, Sturla Molden stu...@molden.no wrote: On 05.03.2012 14:26, V. Armando Solé wrote: In 2009 there was a thread in this mailing list concerning the access to BLAS from C extension modules. If I have properly understood the thread: http://mail.scipy.org/pipermail/numpy-discussion/2009-November/046567.html the answer by then was that those functions were not exposed (only f2py functions). I just wanted to know if the situation has changed since 2009 because it is not uncommon that to optimize some operations one has to sooner or later access BLAS functions that are already wrapped in numpy (either from ATLAS, from the Intel MKL, ...) Why do you want to do this? It does not make your life easier to use NumPy or SciPy's Python wrappers from C. Just use BLAS directly from C instead. Of course it does make his life easier. This way he does not have to distribute his own BLAS/LAPACK/etc... Please stop presenting as truth things which are at best highly opiniated. You already made such statements many times, and it is not helpful at all. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fixing PyArray_Descr flags member size, ABI vs pickling issue
On Tue, Mar 6, 2012 at 1:44 PM, Robert Kern robert.k...@gmail.com wrote: On Tue, Mar 6, 2012 at 18:25, Travis Oliphant tra...@continuum.io wrote: Why do we want to return a single string char instead of an int? I suspect just to ensure that any provided value fits in the range 0..255. But that's easily done explicitly. That was not even the issue in the end, my initial analysis was wrong. In any case, I have now a new PR that fixes both dtypes.flags value and dtype hashing reported in #2017: https://github.com/numpy/numpy/pull/231 regards, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] float96 on windows32 is float64?
On Thu, Mar 15, 2012 at 11:10 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, Am I right in thinking that float96 on windows 32 bit is a float64 padded to 96 bits? Yes If so, is it useful? Yes: this is what allows you to use dtype to parse complex binary files directly in numpy without having to care so much about those details. And that's how it is defined on windows in any case (C standard only forces you to have sizeof(long double) = sizeof(double)). Has anyone got a windows64 box to check float128 ? Too lazy to check on my vm, but I am pretty sure it is 16 bytes on windows 64. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Linking against MKL but still slow?
Hi Christoph, On Mon, Mar 26, 2012 at 10:06 AM, Christoph Dann ch.ro.d...@googlemail.comwrote: Dear list, so far I used Enthoughts Python Distribution which contains a compiled version of numpy linked against MKL. Now, I want to implement my own extensions to numpy, so I need my build numpy on my own. So, I installed Intel Parallel studio including MKL and the C / Fortran compilers. What do you mean by own extensions to NumPy ? If you mean building extensions against the C API of NumPy, then you don't need to build your own NumPy. Building NumPy with Intel Compilers and MKL is a non-trivial process, so I would rather avoid it. If you still want to build it by yourself, could you give us the full output of your build ? David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [ANN] Bento 0.0.8.1
Hi, I am pleased to announce a new release of bento, a packaging solution for python which aims at reproducibility, extensibility and simplicity. The main features of this 0.0.8.1 release are: - Path sections can now use conditionals - More reliable convert command to migrate distutils/setuptools/distribute/distutils2 setup.py to bento - Single-file distribution can now include waf itself - Nose is not necessary to run the test suite anymore - Significant improvements to the distutils compatibility layer - LibraryDir support for backward compatibility with distutils packages relying on the package_dir feature Bento source code can be found on github: https://github.com/cournape/Bento Bento documentation is there as well: https://cournape.github.com/Bento regards, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG
On Sun, Apr 1, 2012 at 2:28 PM, Kamesh Krishnamurthy kames...@gmail.comwrote: Hello all, I profiled NumPy EIG and MATLAB EIG on the same Macbook pro, and both were linking to the Accelerate framework BLAS. NumPy turns out to be ~4x slower. I've posted details on Stackoverflow: http://stackoverflow.com/q/9955021/974568 Can someone please let me know the reason for the performance gap? I would look at two things: - first, are you sure matlab is not using the MKL instead of accelerate framework ? I have not used matlab in ages, but you should be able to check by using otool -L to some of the core dll of matlab, to find out which libraries are linked to it - second, it could be that matlab eig and numpy eig don't use the same underlying lapack API (do they give you the same result ?). This would already be a bit harder to check, unless it is documented explicitly in matlab. regards, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy EIG much slower than MATLAB EIG
On Mon, Apr 2, 2012 at 4:45 PM, Chris Barker chris.bar...@noaa.gov wrote: On Mon, Apr 2, 2012 at 2:25 AM, Nathaniel Smith n...@pobox.com wrote: To see if this is an effect of numpy using C-order by default instead of Fortran-order, try measuring eig(x.T) instead of eig(x)? Just to be clear, .T re-arranges the strides (Making it Fortran order), butyou'll have to make sure your ariginal data is the transpose of whatyou want. I posted this on slashdot, but for completeness: the code posted on slashdot is also profiling the random number generation -- I have no idea how numpy and MATLAB's random number generation compare, nor how random number generation compares to eig(), but you should profile them independently to make sure. While this is true, the cost is most likely negligeable compared to the cost of eig (unless something weird is going on in random as well). David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] YouTrack testbed
On Tue, Apr 10, 2012 at 8:40 PM, Ralf Gommers ralf.gomm...@googlemail.comwrote: On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven bry...@continuum.iowrote: On 4/3/12 4:18 PM, Ralf Gommers wrote: Here some first impressions. The good: - It's responsive! - It remembers my preferences (view type, # of issues per page, etc.) - Editing multiple issues with the command window is easy. - Search and filter functionality is powerful The bad: - Multiple projects are supported, but issues are then really mixed. The way this works doesn't look very useful for combined admin of numpy/scipy trackers. - I haven't found a way yet to make versions and subsystems appear in the one-line issue overview. - Fixed issues are still shown by default. There are several open issues filed against youtrack about this, with no reasonable answers. - Plain text attachments (.txt, .diff, .patch) can't be viewed, only downloaded. - No direct VCS integration, only via Teamcity (not set up, so can't evaluate). - No useful default views as in Trac (http://projects.scipy.org/scipy/report). Ralf, regarding some of the issues: Hi Bryan, thanks for looking into this. I think for numpy/scipy trackers, we could simply run separate instances of YouTrack for each. That would work. It does mean that there's no maintenance advantage over using Trac here. Also we can certainly create some standard queries. It's a small pain not to have useful defaults, but it's only a one-time pain. :) That should help. Also, what kind of integration are you looking for with github? There does appear to be the ability to issue commands to youtrack through git commits, which does not depend on TeamCity, as best I can tell: http://confluence.jetbrains.net/display/YTD3/GitHub+Integration http://blogs.jetbrains.com/youtrack/tag/github-integration/ I'm not sure this is what you were thinking about though. That does help. The other thing that's useful is to reference commits (like commit:abcd123 in current Trac) and have them turned into links to commits on Github. This is not a showstopper for me though. For the other issues, Maggie or I can try and see what we can find out about implementing them, or working around them, this week. I'd say that from the issues I mentioned, the biggest one is the one-line view. So these two: - I haven't found a way yet to make versions and subsystems appear in the one-line issue overview. - Fixed issues are still shown by default. There are several open issues filed against youtrack about this, with no reasonable answers. Of course, we'd like to evaluate any other viable issue trackers as well. Do you have any suggestions for other systems besides YouTrack? David wrote up some issues (some of which I didn't check) with current Trac and looked at Redmine before. He also mentioned Roundup. See http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow Redmine does look good from a quick browse (better view, does display diffs). It would be good to get the opinions of a few more people on this topic. Redmine is trac on RoR, but it solves two significant issues over trac: - mass edit (e.g. moving things to a new mileston is simple and doable from the UI) - REST API by default, so that we can build simple command line tools on top of it (this changed since I made the wiki page) It is a PITA to install, though, at least if you are not familiar with ruby, and I heard it is hard to manage as well. IIRC, roundup was suggested by Robert, but it is more of a custom solution I believe. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] YouTrack testbed
On Thu, Apr 12, 2012 at 5:43 PM, Ralf Gommers ralf.gomm...@googlemail.comwrote: On Tue, Apr 10, 2012 at 9:53 PM, David Cournapeau courn...@gmail.comwrote: On Tue, Apr 10, 2012 at 8:40 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Mon, Apr 9, 2012 at 10:32 PM, Bryan Van de Ven bry...@continuum.iowrote: On 4/3/12 4:18 PM, Ralf Gommers wrote: Here some first impressions. The good: - It's responsive! - It remembers my preferences (view type, # of issues per page, etc.) - Editing multiple issues with the command window is easy. - Search and filter functionality is powerful The bad: - Multiple projects are supported, but issues are then really mixed. The way this works doesn't look very useful for combined admin of numpy/scipy trackers. - I haven't found a way yet to make versions and subsystems appear in the one-line issue overview. - Fixed issues are still shown by default. There are several open issues filed against youtrack about this, with no reasonable answers. - Plain text attachments (.txt, .diff, .patch) can't be viewed, only downloaded. - No direct VCS integration, only via Teamcity (not set up, so can't evaluate). - No useful default views as in Trac (http://projects.scipy.org/scipy/report). Ralf, regarding some of the issues: Hi Bryan, thanks for looking into this. I think for numpy/scipy trackers, we could simply run separate instances of YouTrack for each. That would work. It does mean that there's no maintenance advantage over using Trac here. Also we can certainly create some standard queries. It's a small pain not to have useful defaults, but it's only a one-time pain. :) That should help. Also, what kind of integration are you looking for with github? There does appear to be the ability to issue commands to youtrack through git commits, which does not depend on TeamCity, as best I can tell: http://confluence.jetbrains.net/display/YTD3/GitHub+Integration http://blogs.jetbrains.com/youtrack/tag/github-integration/ I'm not sure this is what you were thinking about though. That does help. The other thing that's useful is to reference commits (like commit:abcd123 in current Trac) and have them turned into links to commits on Github. This is not a showstopper for me though. For the other issues, Maggie or I can try and see what we can find out about implementing them, or working around them, this week. I'd say that from the issues I mentioned, the biggest one is the one-line view. So these two: - I haven't found a way yet to make versions and subsystems appear in the one-line issue overview. - Fixed issues are still shown by default. There are several open issues filed against youtrack about this, with no reasonable answers. Of course, we'd like to evaluate any other viable issue trackers as well. Do you have any suggestions for other systems besides YouTrack? David wrote up some issues (some of which I didn't check) with current Trac and looked at Redmine before. He also mentioned Roundup. See http://projects.scipy.org/numpy/wiki/ImprovingIssueWorkflow Redmine does look good from a quick browse (better view, does display diffs). It would be good to get the opinions of a few more people on this topic. Redmine is trac on RoR, but it solves two significant issues over trac: - mass edit (e.g. moving things to a new mileston is simple and doable from the UI) - REST API by default, so that we can build simple command line tools on top of it (this changed since I made the wiki page) It is a PITA to install, though, at least if you are not familiar with ruby, and I heard it is hard to manage as well. Thanks, that's a clear description of pros and cons. It's also easy to play with Redmine at demo.redmine.org. That site allows you to set up a new project and try the admin interface. And I just discovered this (and in python !) https://github.com/coiled-coil/git-redmine David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] YouTrack testbed
On Thu, Apr 12, 2012 at 9:29 PM, william ratcliff william.ratcl...@gmail.com wrote: Has anyone tried Rietveld, Gerrit, or Phabricator? rietveld and gerrit are code review tools. I have not heard of phabricator, but this article certainly makes it sounds interesting: http://www.readwriteweb.com/hack/2011/09/a-look-at-phabricator-facebook.php There is a quite complete command line interface, arcanist, and if done right, having code review and bug tracking integrated together sounds exciting. Thanks for mentioning it, I will definitely look it out. regards, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] What is consensus anyway
On Wed, Apr 25, 2012 at 10:54 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, On Wed, Apr 25, 2012 at 2:35 PM, Travis Oliphant tra...@continuum.io wrote: Do you agree that Numpy has not been very successful in recruiting and maintaining new developers compared to its large user-base? Compared to - say - Sympy? Why do you think this is? I think it's mostly because it's infrastructure that is a means to an end. I certainly wasn't excited to have to work on NumPy originally, when my main interest was SciPy.I've come to love the interesting plateau that NumPy lives on.But, I think it mostly does the job it is supposed to do. The fact that it is in C is also not very sexy. It is also rather complicated with a lot of inter-related parts. I think NumPy could do much, much more --- but getting there is going to be a challenge of execution and education. You can get to know the code base. It just takes some time and patience. You also have to be comfortable with compilers and building software just to tweak the code. Would you consider asking that question directly on list and asking for the most honest possible answers? I'm always interested in honest answers and welcome any sincere perspective. Of course, there are potential explanations: 1) Numpy is too low-level for most people 2) The C code is too complicated 3) It's fine already, more or less are some obvious ones. I would say there are the easy answers. But of course, the easy answer may not be the right answer. It may not be easy to get right answer [1]. As you can see from Alan Isaac's reply on this thread, even asking the question can be taken as being in bad faith. In that situation, I think you'll find it hard to get sincere replies. While I don't think jumping into NumPy C code is as difficult as some people made it to be, I think numpy reaped most of the low-hanging fruits, and is now at a stage where it requires massive investment to get significantly better. I would suggest a different question, whose answer may serve as a proxy to uncover the lack of contributions: what needs to be done in NumPy, and how can we make it simpler for newcommers ? Here is an incomplete, unshamelessly biased list: - Less dependencies on CPython internals - Allow for 3rd parties to extend numpy at the C level in more fundamental ways (e.g. I wished something like half-float dtype could be more easily developed out of tree) - Separate memory representation from higher level representation (slicing, broadcasting, etc…), to allow arrays to sit on non-contiguous memory areas, etc… - Test and performance infrastructure so we can track our evolution, get coverage of our C code, etc… - Fix bugs - Better integration with 3rd party on-disk storage (database, etc…) None of that is particularly simple nor has a fast learning curve, except for fixing bugs and maybe some of the infrastructure. I think most of this is necessary for the things Travis talked about a few weeks ago. What could make contributions easier: - different levels of C API documentation (still lacking anything besides reference) - ways to detect early when we break ABI, slightly more obscure platforms (we need good CI, ways to publish binaries that people can easily test, etc...) - improve infrastructure so that we can focus on the things we want to work on (improve the dire situation with bug tracking, etc…) Also, lots of people just don't know/want to know C. But people with say web skills would be welcome: we have a website that could use some help… So ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Quaternion data type
On Sat, May 5, 2012 at 9:43 PM, Mark Wiebe mwwi...@gmail.com wrote: On Sat, May 5, 2012 at 1:06 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, May 5, 2012 at 11:19 AM, Mark Wiebe mwwi...@gmail.com wrote: On Sat, May 5, 2012 at 11:55 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Sat, May 5, 2012 at 5:27 AM, Tom Aldcroft aldcr...@head.cfa.harvard.edu wrote: On Fri, May 4, 2012 at 11:44 PM, Ilan Schnell ischn...@enthought.com wrote: Hi Chuck, thanks for the prompt reply. I as curious because because someone was interested in adding http://pypi.python.org/pypi/Quaternion to EPD, but Martin and Mark's implementation of quaternions looks much better. Hi - I'm a co-author of the above mentioned Quaternion package. I agree the numpy_quaternion version would be better, but if there is no expectation that it will move forward I can offer to improve our Quaternion. A few months ago I played around with making it accept arbitrary array inputs (with similar shape of course) to essentially vectorize the transformations. We never got around to putting this in a release because of a perceived lack of interest / priorities... If this would be useful then let me know. Would you be interested in carrying Martin's package forward? I'm not opposed to having quaternions in numpy/scipy but there needs to be someone to push it and deal with problems if they come up. Martin's package disappeared in large part because Martin disappeared. I'd also like to hear from Mark about other aspects, as there was also a simple rational user type proposed that we were looking to put in as an extension 'test' type. IIRC, there were some needed fixes to Numpy, some of which were postponed in favor of larger changes. User types is one of the things we want ot get fixed up. I kind of like the idea of there being a package, separate from numpy, which collects these dtypes together. To start, the quaternion and the rational type could go in it, and eventually I think it would be nice to move datetime64 there as well. Maybe it could be called numpy-dtypes, or would a more creative name be better? I'm trying to think about how that would be organized. We could create a new repository, numpy-user-types (numpy-extension-types), under the numpy umbrella. It would need documents and such as well as someone interested in maintaining it and making releases. A branch in the numpy repository wouldn't work since we would want to rebase it regularly. It could maybe go in scipy but a new package would need to be created there and it feels too distant from numpy for such basic types as datetime. Do you have thoughts about the details? Another repository under the numpy umbrella would best fit what I'm imagining, yes. I would imagine it as a package of additional types that aren't the core ones, but that many people would probably want to install. It would also be a way to continually exercise the type extension system, to make sure it doesn't break. It couldn't be a branch of numpy, rather a collection of additional dtypes and associated useful functions. I would be in favor of this as well. We could start the repository by having one trivial dtype that would serve as an example. That's something I have been interested in, I can lock a couple of hours / week to help this with. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer
On Mon, May 14, 2012 at 5:31 PM, mark florisson markflorisso...@gmail.comwrote: On 12 May 2012 22:55, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 05/11/2012 03:37 PM, mark florisson wrote: On 11 May 2012 12:13, Dag Sverre Seljebotnd.s.seljeb...@astro.uio.no wrote: (NumPy devs: I know, I get too many ideas. But this time I *really* believe in it, I think this is going to be *huge*. And if Mark F. likes it it's not going to be without manpower; and as his mentor I'd pitch in too here and there.) (Mark F.: I believe this is *very* relevant to your GSoC. I certainly don't want to micro-manage your GSoC, just have your take.) Travis, thank you very much for those good words in the NA-mask interactions... thread. It put most of my concerns away. If anybody is leaning towards for opaqueness because of its OOP purity, I want to refer to C++ and its walled-garden of ideological purity -- it has, what, 3-4 different OOP array libraries, neither of which is able to out-compete the other. Meanwhile the rest of the world happily cooperates using pointers, strides, CSR and CSC. Now, there are limits to what you can do with strides and pointers. Noone's denying the need for more. In my mind that's an API where you can do fetch_block and put_block of cache-sized, N-dimensional blocks on an array; but it might be something slightly different. Here's what I'm asking: DO NOT simply keep extending ndarray and the NumPy C API to deal with this issue. What we need is duck-typing/polymorphism at the C level. If you keep extending ndarray and the NumPy C API, what we'll have is a one-to-many relationship: One provider of array technology, multiple consumers (with hooks, I'm sure, but all implementations of the hook concept in the NumPy world I've seen so far are a total disaster!). What I think we need instead is something like PEP 3118 for the abstract array that is only available block-wise with getters and setters. On the Cython list we've decided that what we want for CEP 1000 (for boxing callbacks etc.) is to extend PyTypeObject with our own fields; we could create CEP 1001 to solve this issue and make any Python object an exporter of block-getter/setter-arrays (better name needed). What would be exported is (of course) a simple vtable: typedef struct { int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right, ...); ... } block_getter_setter_array_vtable; Let's please discuss the details *after* the fundamentals. But the reason I put void* there instead of PyObject* is that I hope this could be used beyond the Python world (say, Python-Julia); the void* would be handed to you at the time you receive the vtable (however we handle that). I suppose it would also be useful to have some way of predicting the output format polymorphically for the caller. E.g. dense * block_diagonal results in block diagonal, but dense + block_diagonal results in dense, etc. It might be useful for the caller to know whether it needs to allocate a sparse, dense or block-structured array. Or maybe the polymorphic function could even do the allocation. This needs to happen recursively of course, to avoid intermediate temporaries. The compiler could easily handle that, and so could numpy when it gets lazy evaluation. Ah. But that depends too on the computation to be performed too; a) elementwise, b) axis-wise reductions, c) linear algebra... In my oomatrix code (please don't look at it, it's shameful) I do this using multiple dispatch. I'd rather ignore this for as long as we can, only implementing a[:] = ... -- I can't see how decisions here would trickle down to the API that's used in the kernel, it's more like a pre-phase, and better treated orthogonally. I think if the heavy lifting of allocating output arrays and exporting these arrays work in numpy, then support in Cython could use that (I can already hear certain people object to more complicated array stuff in Cython :). Even better here would be an external project that each our projects could use (I still think the nditer sorting functionality of arrays should be numpy-agnostic and externally available). I agree with the separate project idea. It's trivial for NumPy to incorporate that as one of its methods for exporting arrays, and I don't think it makes sense to either build it into Cython, or outright depend on NumPy. Here's what I'd like (working title: NumBridge?). - Mission: Be the double* + shape + strides in a world where that is no longer enough, by providing tight, focused APIs/ABIs that are usable across C/Fortran/Python. I basically want something I can quickly acquire from a NumPy array, then pass it into my C code without dragging along all the cruft that I don't need. - Written in pure C + specs,
Re: [Numpy-discussion] Masked Array for NumPy 1.7
On Sat, May 19, 2012 at 3:17 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, May 18, 2012 at 3:47 PM, Travis Oliphant tra...@continuum.iowrote: Hey all, After reading all the discussion around masked arrays and getting input from as many people as possible, it is clear that there is still disagreement about what to do, but there have been some fruitful discussions that ensued. This isn't really new as there was significant disagreement about what to do when the masked array code was initially checked in to master. So, in order to move forward, Mark and I are going to work together with whomever else is willing to help with an effort that is in the spirit of my third proposal but has a few adjustments. The idea will be fleshed out in more detail as it progresses, but the basic concept is to create an (experimental) ndmasked object in NumPy 1.7 and leave the actual ndarray object unchanged. While the details need to be worked out here, a goal is to have the C-API work with both ndmasked arrays and arrayobjects (possibly by defining a base-class C-level structure that both ndarrays inherit from). This might also be a good way for Dag to experiment with his ideas as well but that is not an explicit goal. One way this could work, for example is to have PyArrayObject * be the base-class array (essentially the same C-structure we have now with a HASMASK flag). Then, the ndmasked object could inherit from PyArrayObject * as well but add more members to the C-structure. I think this is the easiest thing to do and requires the least amount of code-change. It is also possible to define an abstract base-class PyArrayObject * that both ndarray and ndmasked inherit from. That way ndarray and ndmasked are siblings even though the ndarray would essentially *be* the PyArrayObject * --- just with a different type-hierarchy on the python side. This work will take some time and, therefore, I don't expect 1.7 to be released prior to SciPy Austin with an end of June target date. The timing will largely depend on what time is available from people interested in resolving the situation. Mark and I will have some availability for this work in June but not a great deal (about 2 man-weeks total between us).If there are others who can step in and help, it will help accelerate the process. This will be a difficult thing for others to help with since the concept is vague, the design decisions seem to be in your and Mark's hands, and you say you don't have much time. It looks to me like 1.7 will keep slipping and I don't think that is a good thing. Why not go for option 2, which will get 1.7 out there and push the new masked array work in to 1.8? Breaking the flow of development and release has consequences, few of them good. Agreed. 1.6.0 was released one year ago already, let's focus on polishing what's in there *now*. I have not followed closely what the decision was for a LTS release, but if 1.7 is supposed to be it, that's another argument about changing anything there for 1.7. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-Dev] Ubuntu PPA for NumPy / SciPy / ...
On Thu, Jun 7, 2012 at 5:24 PM, Andreas Hilboll li...@hilboll.de wrote: Hi, I just noticed that there's a PPA for NumPy/SciPy on Launchpad: https://launchpad.net/~scipy/+archive/ppa However, it's painfully outdated. Does anyone know of its status? Is it 'official'? Are there any plans in revitalizing it, possibly with adding other projects from the scipy universe? Is there help needed? Many questions, but possibly quite easy to answer ... I set up this PPA a long time ago. I just don't have time to maintain it at that point, but would be happy to give someone the keys to make it up to date. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Neighborhood iterator: way to easily check which elements have already been visited in parent iterator?
Not the neighborhood one, though. It would be good if this iterator had a cython wrapper, and ndimage used that, though. Le 13 juin 2012 18:59, Ralf Gommers ralf.gomm...@googlemail.com a écrit : On Wed, Jun 13, 2012 at 6:57 PM, Thouis (Ray) Jones tho...@gmail.comwrote: Hello, I'm rewriting scipy.ndimage.label() using numpy's iterator API, I think there were some changes to the iterator API recently, so please keep in mind that scipy has to still be compatible with numpy 1.5.1 (at least for now). Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [ANN] Bento 0.1.0
Hi, I am pleased to announce a new release of bento, a packaging solution for python which aims at reproducibility, extensibility and simplicity. The main features of this 0.1.0 release are: - new commands register_pypi and upload_pypi to register a package to pypi and upload tarballs to it. - add sphinx command to build a package documentation if it uses sphinx. - add tweak_library/tweak_extension functions to build contexts to simplify simple builder customization (e.g. include_dirs, defines, etc...) - waf backend: cython tool automatically loaded if cython files are detected in sources - UseBackends feature: allows to declare which build backend to use when building C extensions in the bento.info file directly - add --use-distutils-flags configure option to force using flags from distutils (disabled by default). - add --disable-autoconfigure build option to bypass configure for fast partial rebuilds. This is not reliable depending on how the environment is changed, so one should only use this during development. - add register_metadata API to register new metadata to be filled in MetaTemplateFile Bento source code can be found on github: https://github.com/cournape/Bento Bento documentation is there as well: https://cournape.github.com/Bento regards, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pull request: Split maskna support out of mainline into a branch
On Thu, Jun 14, 2012 at 5:17 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jun 6, 2012 at 11:08 PM, Nathaniel Smith n...@pobox.com wrote: Just submitted this pull request for discussion: https://github.com/numpy/numpy/pull/297 As per earlier discussion on the list, this PR attempts to remove exactly and only the maskna-related code from numpy mainline: http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html The suggestion is that we merge this to master for the 1.7 release, and immediately git revert it on a branch so that it can be modified further without blocking the release. The first patch does the actual maskna removal; the second and third rearrange things so that PyArray_ReduceWrapper does not end up in the public API, for reasons described therein. All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit Ubuntu. The docs also appear to build. Before I re-based this I also tested against Scipy, matplotlib, and pandas, and all were fine. While it's tempting to think that the lack of response to this email/PR indicates that everyone now agrees with me about how to proceed with the NA work, I'm for some reason unconvinced... Any objections to merging this? No objection, but could you wait for this WE ? I am in the middle of setting up a buildbot for windows for numpy (for both mingw and MSVC compilers), and that would be a good way to test it. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Pull request: Split maskna support out of mainline into a branch
On Sat, Jun 16, 2012 at 9:39 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Jun 14, 2012 at 5:20 PM, David Cournapeau courn...@gmail.com wrote: On Thu, Jun 14, 2012 at 5:17 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jun 6, 2012 at 11:08 PM, Nathaniel Smith n...@pobox.com wrote: Just submitted this pull request for discussion: https://github.com/numpy/numpy/pull/297 As per earlier discussion on the list, this PR attempts to remove exactly and only the maskna-related code from numpy mainline: http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html The suggestion is that we merge this to master for the 1.7 release, and immediately git revert it on a branch so that it can be modified further without blocking the release. The first patch does the actual maskna removal; the second and third rearrange things so that PyArray_ReduceWrapper does not end up in the public API, for reasons described therein. All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit Ubuntu. The docs also appear to build. Before I re-based this I also tested against Scipy, matplotlib, and pandas, and all were fine. While it's tempting to think that the lack of response to this email/PR indicates that everyone now agrees with me about how to proceed with the NA work, I'm for some reason unconvinced... Any objections to merging this? No objection, but could you wait for this WE ? I am in the middle of setting up a buildbot for windows for numpy (for both mingw and MSVC compilers), and that would be a good way to test it. Sounds like we have consensus and the patch is good to go, so let me know when you're ready... Setting up the windows builbot is even more of a pain than I expected :( In the end, I just tested your branch with MSVC for python 2.7 (32 bits), and got the following errors related to NA: == ERROR: test_numeric.TestIsclose.test_masked_arrays -- Traceback (most recent call last): File C:\Python27\lib\site-packages\nose-1.1.2-py2.7.egg\nose\case.py, line 197, in runTest self.test(*self.arg) File C:\Users\david\tmp\numpy-git\numpy\core\tests\test_numeric.py, line 1274, in test_masked_arrays assert_(type(x) == type(isclose(inf, x))) File C:\Users\david\tmp\numpy-git\numpy\core\numeric.py, line 2073, in isclose cond[~finite] = (x[~finite] == y[~finite]) File C:\Users\david\tmp\numpy-git\numpy\ma\core.py, line 3579, in __eq__ check = ndarray.__eq__(self.filled(0), odata).view(type(self)) AttributeError: 'NotImplementedType' object has no attribute 'view' == ERROR: Test a special case for var -- Traceback (most recent call last): File C:\Users\david\tmp\numpy-git\numpy\ma\tests\test_core.py, line 2735, in test_varstd_specialcases _ = method(out=nout) File C:\Users\david\tmp\numpy-git\numpy\ma\core.py, line 4778, in std dvar = sqrt(dvar) File C:\Users\david\tmp\numpy-git\numpy\ma\core.py, line 849, in __call__ m |= self.domain(d) File C:\Users\david\tmp\numpy-git\numpy\ma\core.py, line 801, in __call__ return umath.less(x, self.critical_value) RuntimeWarning: invalid value encountered in less David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Created NumPy 1.7.x branch
On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote: My understanding is that Travis is simply trying to stress We have to think about the implications of our changes on existing users. and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well. I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC). The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically difficult because nobody will upgrade to a new numpy with a different API just because it is cleaner, but without a cleaner API, it will be difficult to implement quite a few improvements. The situation is not that different form python 3, which has seen a poor adoption, and only starts having interesting feature on its own now. As for more concrete actions: I believe Wes McKinney has a comprehensive suite with multiple versions of numpy/pandas, I can't seem to find where that was mentioned, though. This would be a good starting point to check ABI matters (say pandas, mpl, scipy on top of multiple numpy). David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Created NumPy 1.7.x branch
On Tue, Jun 26, 2012 at 4:42 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote: On Mon, Jun 25, 2012 at 8:35 PM, David Cournapeau courn...@gmail.com wrote: On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote: My understanding is that Travis is simply trying to stress We have to think about the implications of our changes on existing users. and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well. I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC). The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically difficult because nobody will upgrade to a new numpy with a different API just because it is cleaner, but without a cleaner API, it will be difficult to implement quite a few improvements. The situation is not that different form python 3, which has seen a poor adoption, and only starts having interesting feature on its own now. As for more concrete actions: I believe Wes McKinney has a comprehensive suite with multiple versions of numpy/pandas, I can't seem to find where that was mentioned, though. This would be a good starting point to check ABI matters (say pandas, mpl, scipy on top of multiple numpy). I will try to check as many packages as I can to see what actual problems arise. I have created an issue for it: https://github.com/numpy/numpy/issues/319 Feel free to add more packages that you feel are important. I will try to check at least the ones that are in the issue, and more if I have time. I will close the issue once the upgrade path is clearly documented in the release for every thing that breaks. I believe the basis can be 1.4.1 against which we build different packages, and then test each new version. There are also tools to check ABI compatibility (e.g. http://ispras.linuxbase.org/index.php/ABI_compliance_checker), but I have never used them. Being able to tell when a version of numpy breaks ABI would already be a good improvement. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Created NumPy 1.7.x branch
On Tue, Jun 26, 2012 at 5:17 AM, Travis Oliphant tra...@continuum.io wrote: On Jun 25, 2012, at 10:35 PM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote: My understanding is that Travis is simply trying to stress We have to think about the implications of our changes on existing users. and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well. I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC). In the present climate, I'm going to have to provide additional context to a comment like this. This is not an accurate enough characterization of events. I was trying to get date-time changes in, for sure. I generally like feature additions to NumPy. (Robert Kern was also involved with that effort and it was funded by an active user of NumPy. I was concerned that the changes would break the ABI. I did not mean to go back at old history, sorry. My main point was to highlight ABI vs API issues. Numpy needs to decide whether it attempts to keep ABI or not. We already had this discussion 2 years ago (for the issue mentioned by Ondrej), and the decision was not made. The arguments and their value did not really change. The issue is thus that a decision needs to be made over that disagreement in one way or the other. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Created NumPy 1.7.x branch
On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 05:35 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com wrote: My understanding is that Travis is simply trying to stress We have to think about the implications of our changes on existing users. and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well. I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC). The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically But I think it is *possible* to get to a situation where ABI isn't broken without changing API. I have posted such a proposal. If one uses the kind of C-level duck typing I describe in the link below, one would do typedef PyObject PyArrayObject; typedef struct { ... } NumPyArray; /* used to be PyArrayObject */ Maybe we're just in violent agreement, but whatever ends up being used would require to change the *current* C API, right ? If one wants to allow for changes in our structures more freely, we have to hide them from the headers, which means breaking the code that depends on the structure binary layout. Any code that access those directly will need to be changed. There is the particular issue of iterator, which seem quite difficult to make ABI-safe without losing significant performance. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)
Hi, I am just continuing the discussion around ABI/API, the technical side of things that is, as this is unrelated to 1.7.x. release. On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 11:58 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 05:35 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com wrote: My understanding is that Travis is simply trying to stress We have to think about the implications of our changes on existing users. and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well. I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC). The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically But I think it is *possible* to get to a situation where ABI isn't broken without changing API. I have posted such a proposal. If one uses the kind of C-level duck typing I describe in the link below, one would do typedef PyObject PyArrayObject; typedef struct { ... } NumPyArray; /* used to be PyArrayObject */ Maybe we're just in violent agreement, but whatever ends up being used would require to change the *current* C API, right ? If one wants to Accessing arr-dims[i] directly would need to change. But that's been discouraged for a long time. By API I meant access through the macros. One of the changes under discussion here is to change PyArray_SHAPE from a macro that accepts both PyObject* and PyArrayObject* to a function that only accepts PyArrayObject* (hence breakage). I'm saying that under my proposal, assuming I or somebody else can find the time to implement it under, you can both make it a function and have it accept both PyObject* and PyArrayObject* (since they are the same), undoing the breakage but allowing to hide the ABI. (It doesn't give you full flexibility in ABI, it does require that you somewhere have an npy_intp dims[nd] with the same lifetime as your object, etc., but I don't consider that a big disadvantage). allow for changes in our structures more freely, we have to hide them from the headers, which means breaking the code that depends on the structure binary layout. Any code that access those directly will need to be changed. There is the particular issue of iterator, which seem quite difficult to make ABI-safe without losing significant performance. I don't agree (for some meanings of ABI-safe). You can export the data (dataptr/shape/strides) through the ABI, then the iterator uses these in whatever way it wishes consumer-side. Sort of like PEP 3118 without the performance degradation. The only sane way IMO of doing iteration is building it into the consumer anyway. (I have not read the whole cython discussion yet) What do you mean by building iteration in the consumer ? My understanding is that any data export would be done through a level of indirection (dataptr/shape/strides). Conceptually, I can't see how one could keep ABI without that level of indirection without some compile. In the case of iterator, that means multiple pointer chasing per sample -- i.e. the tight loop issue you mentioned earlier for PyArray_DATA is the common case for iterator. I can only see two ways of doing fast (special casing) iteration: compile-time special casing or runtime optimization. Compile-time requires access to the internals (even if one were to use C++ with advanced template magic ala STL/iterator, I don't think one can get performance if everything is not in the headers, but maybe C++ compilers are super smart those days in ways I can't comprehend). I would think runtime is the long-term solution, but that's far away, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)
On Tue, Jun 26, 2012 at 2:40 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 01:48 PM, David Cournapeau wrote: Hi, I am just continuing the discussion around ABI/API, the technical side of things that is, as this is unrelated to 1.7.x. release. On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 11:58 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/26/2012 05:35 AM, David Cournapeau wrote: On Tue, Jun 26, 2012 at 4:10 AM, Ondřej Čertíkondrej.cer...@gmail.com wrote: My understanding is that Travis is simply trying to stress We have to think about the implications of our changes on existing users. and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well. I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC). The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically But I think it is *possible* to get to a situation where ABI isn't broken without changing API. I have posted such a proposal. If one uses the kind of C-level duck typing I describe in the link below, one would do typedef PyObject PyArrayObject; typedef struct { ... } NumPyArray; /* used to be PyArrayObject */ Maybe we're just in violent agreement, but whatever ends up being used would require to change the *current* C API, right ? If one wants to Accessing arr-dims[i] directly would need to change. But that's been discouraged for a long time. By API I meant access through the macros. One of the changes under discussion here is to change PyArray_SHAPE from a macro that accepts both PyObject* and PyArrayObject* to a function that only accepts PyArrayObject* (hence breakage). I'm saying that under my proposal, assuming I or somebody else can find the time to implement it under, you can both make it a function and have it accept both PyObject* and PyArrayObject* (since they are the same), undoing the breakage but allowing to hide the ABI. (It doesn't give you full flexibility in ABI, it does require that you somewhere have an npy_intp dims[nd] with the same lifetime as your object, etc., but I don't consider that a big disadvantage). allow for changes in our structures more freely, we have to hide them from the headers, which means breaking the code that depends on the structure binary layout. Any code that access those directly will need to be changed. There is the particular issue of iterator, which seem quite difficult to make ABI-safe without losing significant performance. I don't agree (for some meanings of ABI-safe). You can export the data (dataptr/shape/strides) through the ABI, then the iterator uses these in whatever way it wishes consumer-side. Sort of like PEP 3118 without the performance degradation. The only sane way IMO of doing iteration is building it into the consumer anyway. (I have not read the whole cython discussion yet) I'll try to write a summary and post it when I can get around to it. What do you mean by building iteration in the consumer ? My consumer is the user of the NumPy C API. So I meant that the iteration logic is all in C header files and compiled again for each such consumer. Iterators don't cross the ABI boundary. understanding is that any data export would be done through a level of indirection (dataptr/shape/strides). Conceptually, I can't see how one could keep ABI without that level of indirection without some compile. In the case of iterator, that means multiple pointer chasing per sample -- i.e. the tight loop issue you mentioned earlier for PyArray_DATA is the common case for iterator. Even if you do indirection, iterator utilities that are compiled in the consumer/user code can cache the data that's retrieved. Iterators just do // setup crossing ABI npy_intp *shape = PyArray_DIMS(arr); npy_intp *strides = PyArray_STRIDES(arr); ... // performance-sensitive code just accesses cached pointers and don't // cross ABI The problem is that iterators need more that this. But thinking more about it, I am not so dead sure we could not get there. I will need to play with some code. Going slightly OT, then IMO, the *only* long-term solution in 2012 is LLVM. That allows you to do any level of inlining and special casing and optimization at run-time, which is the only way
Re: [Numpy-discussion] Created NumPy 1.7.x branch
On Tue, Jun 26, 2012 at 5:24 PM, Travis Oliphant tra...@continuum.io wrote: Let us note that that problem was due to Travis convincing David to include the Datetime work in the release against David's own best judgement. The result was a delay of several months until Ralf could get up to speed and get 1.4.1 out. Let us also note that poly1d is actually not the same as Matlab poly1d. This is not accurate, Charles. Please stop trying to dredge up old history you don't know the full story about and are trying to create an alternate reality about. It doesn't help anything and is quite poisonous to this mailing list. I didn't start the discussion of 1.4, nor did I raise the issue at the time as I didn't think it would be productive. We moved forward. But in any case, I asked David at the time why the datetime stuff got included. I'd welcome your version if you care to offer it. That would be more useful than accusing me of creating an alternative reality and would clear the air. The datetime stuff got included because it is a very useful and important feature for multiple users. It still needed work, but it was in a state where it could be tried. It did require breaking ABI compatibility in the state it was in. My approach was to break ABI compatibility and move forward (there were other things we could do at the time that are still needed in the code base that will break ABI compatibility in the future). David didn't want to break ABI compatibility and so tried to satisfy two competing desires in a way that did not ultimately work. These things happen. We all get to share responsibility for the outcome. I think Chuck alludes to the fact that I was rather reserved about merging datetime before *anyone* knew about breaking the ABI. I don't feel responsible for this issue (except I maybe should have pushed more strongly about datetime being included), but I am also not interested in making a big deal out of it, certainly not two years after the fact. I am merely point this out so that you realize that you may both have a different view that could be seen as valid depending on what you are willing to highlight. I suggest that Chuck and you take this off-list, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combined versus separate build
On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smith n...@pobox.com wrote: Currently the numpy build system(s) support two ways of building numpy: either by compiling a giant concatenated C file, or by the more conventional route of first compiling each .c file to a .o file, and then linking those together. I gather from comments in the source code that the former is the traditional method, and the latter is the newer experimental approach. It's easy to break one of these builds without breaking the other (I just did this with the NA branch, and David had to clean up after me), and I don't see what value we really get from having both options -- it seems to just double the size of the test matrix without adding value. There is unfortunately a big value in it: there is no standard way in C to share symbols within a library without polluting the whole process namespace, except on windows where the default is to export nothing. Most compilers support it (I actually know of none that does not support it in some way or the others), but that's platform-specific. I do find the multi-file support useful when developing (it does not make the full build faster, but I find partial rebuild too slow without it). David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combined versus separate build
On Wed, Jun 27, 2012 at 8:07 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jun 27, 2012 at 7:50 PM, David Cournapeau courn...@gmail.com wrote: On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smith n...@pobox.com wrote: Currently the numpy build system(s) support two ways of building numpy: either by compiling a giant concatenated C file, or by the more conventional route of first compiling each .c file to a .o file, and then linking those together. I gather from comments in the source code that the former is the traditional method, and the latter is the newer experimental approach. It's easy to break one of these builds without breaking the other (I just did this with the NA branch, and David had to clean up after me), and I don't see what value we really get from having both options -- it seems to just double the size of the test matrix without adding value. There is unfortunately a big value in it: there is no standard way in C to share symbols within a library without polluting the whole process namespace, except on windows where the default is to export nothing. Most compilers support it (I actually know of none that does not support it in some way or the others), but that's platform-specific. IIRC this isn't too tricky to arrange for with gcc No, which is why this is supported for gcc and windows :) , but why is this an issue in the first place for a Python extension module? Extension modules are opened without RTLD_GLOBAL, which means that they *never* export any symbols. At least, that's how it should work on Linux and most Unix-alikes; I don't know much about OS X's linker, except that it's unusual in other ways. The pragmatic answer is that if it were not an issue, python itself would not bother with it. Every single extension module in python itself is built from a single compilation unit. This is also why we have this awful system to export the numpy C API with array of function pointers instead of simply exporting things in a standard way. See this: http://docs.python.org/release/2.5.3/ext/using-cobjects.html Looking quickly at the 2.7.3 sources, the more detailed answer is that python actually does not use RTLD_LOCAL (nor RTLD_GLOBAL), and what happens when neither of them is used is implementation-dependent. It seems to be RTLD_LOCAL on linux, and RTLD_GLOBAL on mac os x. There also may be consequences on the use of RTLD_LOCAL in embedded mode (I have ancient and bad memories with matlab related to this, but I forgot the details). David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combined versus separate build
On Wed, Jun 27, 2012 at 8:53 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jun 27, 2012 at 8:29 PM, David Cournapeau courn...@gmail.com wrote: On Wed, Jun 27, 2012 at 8:07 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jun 27, 2012 at 7:50 PM, David Cournapeau courn...@gmail.com wrote: On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smith n...@pobox.com wrote: Currently the numpy build system(s) support two ways of building numpy: either by compiling a giant concatenated C file, or by the more conventional route of first compiling each .c file to a .o file, and then linking those together. I gather from comments in the source code that the former is the traditional method, and the latter is the newer experimental approach. It's easy to break one of these builds without breaking the other (I just did this with the NA branch, and David had to clean up after me), and I don't see what value we really get from having both options -- it seems to just double the size of the test matrix without adding value. There is unfortunately a big value in it: there is no standard way in C to share symbols within a library without polluting the whole process namespace, except on windows where the default is to export nothing. Most compilers support it (I actually know of none that does not support it in some way or the others), but that's platform-specific. IIRC this isn't too tricky to arrange for with gcc No, which is why this is supported for gcc and windows :) , but why is this an issue in the first place for a Python extension module? Extension modules are opened without RTLD_GLOBAL, which means that they *never* export any symbols. At least, that's how it should work on Linux and most Unix-alikes; I don't know much about OS X's linker, except that it's unusual in other ways. The pragmatic answer is that if it were not an issue, python itself would not bother with it. Every single extension module in python itself is built from a single compilation unit. This is also why we have this awful system to export the numpy C API with array of function pointers instead of simply exporting things in a standard way. The array-of-function-pointers is solving the opposite problem, of exporting functions *without* having global symbols. I meant that the lack of standard around symbols and namespaces is why we have to do those hacks. Most platforms have much better solutions to those problems. See this: http://docs.python.org/release/2.5.3/ext/using-cobjects.html Looking quickly at the 2.7.3 sources, the more detailed answer is that python actually does not use RTLD_LOCAL (nor RTLD_GLOBAL), and what happens when neither of them is used is implementation-dependent. It seems to be RTLD_LOCAL on linux, and RTLD_GLOBAL on mac os x. There also may be consequences on the use of RTLD_LOCAL in embedded mode (I have ancient and bad memories with matlab related to this, but I forgot the details). See, I knew OS X was quirky :-). That's what I get for trusting dlopen(3). But seriously, what compilers do we support that don't have -fvisibility=hidden? ...Is there even a list of compilers we support available anywhere? Well, I am not sure how all this is handled on the big guys (bluegen and co), for once. There is also the issue of the consequence on statically linking numpy to python: I don't what they are (I would actually like to make statically linked numpy into python easier, not harder). David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combined versus separate build
On Wed, Jun 27, 2012 at 8:57 PM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 06/27/2012 09:53 PM, Nathaniel Smith wrote: On Wed, Jun 27, 2012 at 8:29 PM, David Cournapeaucourn...@gmail.com wrote: On Wed, Jun 27, 2012 at 8:07 PM, Nathaniel Smithn...@pobox.com wrote: On Wed, Jun 27, 2012 at 7:50 PM, David Cournapeaucourn...@gmail.com wrote: On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smithn...@pobox.com wrote: Currently the numpy build system(s) support two ways of building numpy: either by compiling a giant concatenated C file, or by the more conventional route of first compiling each .c file to a .o file, and then linking those together. I gather from comments in the source code that the former is the traditional method, and the latter is the newer experimental approach. It's easy to break one of these builds without breaking the other (I just did this with the NA branch, and David had to clean up after me), and I don't see what value we really get from having both options -- it seems to just double the size of the test matrix without adding value. There is unfortunately a big value in it: there is no standard way in C to share symbols within a library without polluting the whole process namespace, except on windows where the default is to export nothing. Most compilers support it (I actually know of none that does not support it in some way or the others), but that's platform-specific. IIRC this isn't too tricky to arrange for with gcc No, which is why this is supported for gcc and windows :) , but why is this an issue in the first place for a Python extension module? Extension modules are opened without RTLD_GLOBAL, which means that they *never* export any symbols. At least, that's how it should work on Linux and most Unix-alikes; I don't know much about OS X's linker, except that it's unusual in other ways. The pragmatic answer is that if it were not an issue, python itself would not bother with it. Every single extension module in python itself is built from a single compilation unit. This is also why we have this awful system to export the numpy C API with array of function pointers instead of simply exporting things in a standard way. The array-of-function-pointers is solving the opposite problem, of exporting functions *without* having global symbols. See this: http://docs.python.org/release/2.5.3/ext/using-cobjects.html Looking quickly at the 2.7.3 sources, the more detailed answer is that python actually does not use RTLD_LOCAL (nor RTLD_GLOBAL), and what happens when neither of them is used is implementation-dependent. It seems to be RTLD_LOCAL on linux, and RTLD_GLOBAL on mac os x. There also may be consequences on the use of RTLD_LOCAL in embedded mode (I have ancient and bad memories with matlab related to this, but I forgot the details). See, I knew OS X was quirky :-). That's what I get for trusting dlopen(3). But seriously, what compilers do we support that don't have -fvisibility=hidden? ...Is there even a list of compilers we support available anywhere? You could at the very least switch the default for a couple of releases, introducing a new flag with a please email numpy-discussion if you use this note, and see if anybody complains? Yes, we could. That's actually why I set up travis-CI to build both configurations in the first place :) (see https://github.com/numpy/numpy/issues/315) David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Dropping support for Python 2.4 in NumPy 1.8
Hi Travis, On Thu, Jun 28, 2012 at 1:25 PM, Travis Oliphant tra...@continuum.io wrote: Hey all, I'd like to propose dropping support for Python 2.4 in NumPy 1.8 (not the 1.7 release). What does everyone think of that? I think it would depend on 1.7 state. I am unwilling to drop support for 2.4 in 1.8 unless we make 1.7 a LTS, that would be supported up to 2014 Q1 (when RHEL5 stops getting security fixes - RHEL 5 is the one platform that warrants supporting 2.4 IMO) In my mind, it means 1.7 needs to be stable. Ondrej (and others) work to make sure we break neither API or ABI since a few releases would help achieving that. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Non-deterministic test failure in master
On Thu, Jun 28, 2012 at 8:06 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Jun 28, 2012 at 7:13 AM, Pierre Haessig pierre.haes...@crans.org wrote: Hi Nathaniel, Le 27/06/2012 20:22, Nathaniel Smith a écrit : According to the Travis-CI build logs, this code produces non-deterministic behaviour in master: You mean non-deterministic across different builds, not across different executions on the same build, right ? I just ran a small loop : N = 1 N_good = 0 for i in range(N): a = np.arange(5) a[:3] = a[2:] if (a == [2,3,4,3,4]).all(): N_good += 1 print 'good result : %d/%d' % (N_good, N) and got 100 % good replication. Yes, the current hypothesis is that there is one particular Travis-CI machine on which memcpy goes backwards, and so the test fails whenever the build gets assigned to that machine. (Apparently this is actually faster on some CPUs, and new versions of glibc are known to exploit this.) see also this: https://bugzilla.redhat.com/show_bug.cgi?id=638477 David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Strange problem
On Fri, Jun 29, 2012 at 9:54 AM, Uwe Schmitt uschm...@mineway.de wrote: Hi, I have unreproducable crashes on a customers Win 7 machine with Python 2.7.2 and Numpy 1.6.1. He gets the following message: Problem signature: Problem Event Name: APPCRASH Application Name: python.exe Application Version: 0.0.0.0 Application Timestamp: 4df4ba7c Fault Module Name: umath.pyd Fault Module Version: 0.0.0.0 Fault Module Timestamp: 4e272b96 Exception Code: c005 Exception Offset: 0001983a OS Version: 6.1.7601.2.1.0.256.4 Locale ID: 2055 Additional Information 1: 0a9e Additional Information 2: 0a9e372d3b4ad19135b953a78882e789 Additional Information 3: 0a9e Additional Information 4: 0a9e372d3b4ad19135b953a78882e789 I know that I can not expect a clear answer without more information, but my customer is on hollidays and I just wanted to ask for some hints for possible reasons. The machine is not out of memory and despite this crash runs very stable. Is this on 32 or 64 bits windows ? Do you know if your customer uses only numpy, or other packages that depend on numpy C extension ? David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combined versus separate build
On Sun, Jul 1, 2012 at 6:36 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jun 27, 2012 at 9:05 PM, David Cournapeau courn...@gmail.com wrote: On Wed, Jun 27, 2012 at 8:53 PM, Nathaniel Smith n...@pobox.com wrote: But seriously, what compilers do we support that don't have -fvisibility=hidden? ...Is there even a list of compilers we support available anywhere? Well, I am not sure how all this is handled on the big guys (bluegen and co), for once. There is also the issue of the consequence on statically linking numpy to python: I don't what they are (I would actually like to make statically linked numpy into python easier, not harder). All the docs I can find in a quick google seem to say that bluegene doesn't do shared libraries at all, though those may be out of date. Also, it looks like our current approach is not doing a great job of avoiding symbol table pollution... despite all the NPY_NO_EXPORTS all over the source, I still count ~170 exported symbols on Linux with numpy 1.6, many of them with non-namespaced names (_n_to_n_data_copy, _next, npy_tan, etc.) Of course this is fixable, but it's interesting that no-one has noticed. (Current master brings this up to ~300 exported symbols.) It sounds like as far as our officially supported platforms go (linux/windows/osx with gcc/msvc), then the ideal approach would be to use -fvisibility=hidden or --retain-symbols-file to convince gcc to hide symbols by default, like msvc does. That would let us remove cruft from the source code, produce a more reliable result, and let us use the more convenient separate build, with no real downsides. What cruft would it allow us to remove ? Whatever method we use, we need a whitelist of symbols to export. On the exported list I see on mac, most of them are either from npymath (npy prefix) or npysort (no prefix, I think this should be added). Once those are ignored as they should be, there are 30 symbols exported. (Static linking is trickier because no-one uses it anymore so the docs aren't great, but I think on Linux at least you could accomplish the equivalent by building the static library with 'ld -r ... -o tmp-multiarray.a; objcopy --keep-global-symbol=initmultiarray tmp-multiarray.a multiarray.a'.) I am not sure why you say that static linking is not used anymore: I have met some people who do statically link numpy into python. Of course there are presumably other platforms that we don't support or test on, but where we have users anyway. Building on such a platform sort of intrinsically requires build system hacks, and some equivalent to the above may well be available (e.g. I know icc supports -fvisibility). So I while I'm not going to do anything about this myself in the near future, I'd argue that it would be a good idea to: - Switch the build-system to export nothing by default when using gcc, using -fvisibility=hidden - Switch the default build to separate - Leave in the single-file build, but not officially supported, i.e., we're happy to get patches but it's not used on any systems that we can actually test ourselves. (I suspect it's less fragile than the separate build anyway, since name clashes are less common than forgotten include files.) I am fine with making the separate build the default (I have a patch somewhere that does that on supported platforms), but not with using -fvisibility=hidden. When I implemented the initial support around this, fvisibility was buggy on some platforms, including mingw 3.x I don't think changing what our implementation does here is worthwhile given that it works, and fsibility=hidden has no big advantages (you would still need to mark the functions to be exported). David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combined versus separate build
On Sun, Jul 1, 2012 at 8:32 PM, Nathaniel Smith n...@pobox.com wrote: On Sun, Jul 1, 2012 at 7:36 PM, David Cournapeau courn...@gmail.com wrote: On Sun, Jul 1, 2012 at 6:36 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jun 27, 2012 at 9:05 PM, David Cournapeau courn...@gmail.com wrote: On Wed, Jun 27, 2012 at 8:53 PM, Nathaniel Smith n...@pobox.com wrote: But seriously, what compilers do we support that don't have -fvisibility=hidden? ...Is there even a list of compilers we support available anywhere? Well, I am not sure how all this is handled on the big guys (bluegen and co), for once. There is also the issue of the consequence on statically linking numpy to python: I don't what they are (I would actually like to make statically linked numpy into python easier, not harder). All the docs I can find in a quick google seem to say that bluegene doesn't do shared libraries at all, though those may be out of date. Also, it looks like our current approach is not doing a great job of avoiding symbol table pollution... despite all the NPY_NO_EXPORTS all over the source, I still count ~170 exported symbols on Linux with numpy 1.6, many of them with non-namespaced names (_n_to_n_data_copy, _next, npy_tan, etc.) Of course this is fixable, but it's interesting that no-one has noticed. (Current master brings this up to ~300 exported symbols.) It sounds like as far as our officially supported platforms go (linux/windows/osx with gcc/msvc), then the ideal approach would be to use -fvisibility=hidden or --retain-symbols-file to convince gcc to hide symbols by default, like msvc does. That would let us remove cruft from the source code, produce a more reliable result, and let us use the more convenient separate build, with no real downsides. What cruft would it allow us to remove ? Whatever method we use, we need a whitelist of symbols to export. No, right now we don't have a whitelist, we have a blacklist -- every time we add a new function or global variable, we have to remember to add a NPY_NO_EXPORT tag to its definition. Except the evidence says that we don't do that reliably. (Everyone always sucks at maintaining blacklists, that's the nature of blacklists.) I'm saying that we'd better off if we did have a whitelist. Especially since CPython API makes maintaining this whitelist so very trivial -- each module exports exactly one symbol! There may be some confusion on what NPY_NP_EXPORT does: it marks a function that can be used between compilation units but is not exported. The choice is between static and NPY_NO_EXPORT, not between NPY_NO_EXPORT and nothing. In that sense, marking something NPY_NO_EXPORT is a whitelist. If we were to use -fvisibility=hidden, we would still need to mark those functions static (as it would otherwise publish functions in the single file build). Yes, of course, or I wouldn't have bothered researching it. But this research would have been easier if there were enough of a user base that the tools makers actually paid any attention to supporting this use case, is all I was saying :-). Of course there are presumably other platforms that we don't support or test on, but where we have users anyway. Building on such a platform sort of intrinsically requires build system hacks, and some equivalent to the above may well be available (e.g. I know icc supports -fvisibility). So I while I'm not going to do anything about this myself in the near future, I'd argue that it would be a good idea to: - Switch the build-system to export nothing by default when using gcc, using -fvisibility=hidden - Switch the default build to separate - Leave in the single-file build, but not officially supported, i.e., we're happy to get patches but it's not used on any systems that we can actually test ourselves. (I suspect it's less fragile than the separate build anyway, since name clashes are less common than forgotten include files.) I am fine with making the separate build the default (I have a patch somewhere that does that on supported platforms), but not with using -fvisibility=hidden. When I implemented the initial support around this, fvisibility was buggy on some platforms, including mingw 3.x It's true that mingw doesn't support -fvisibility=hidden, but that's because it would be a no-op; windows already works that way by default... That's not my understanding: gcc behaves on windows as on linux (it would break too many softwares that are the usual target of mingw otherwise), but the -fvisibility flag is broken on gcc 3.x. The more recent mingw supposedly handle this better, but we can't use gcc 4.x because of another issue regarding private dll sharing :) David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy performance
On Mon, Jul 2, 2012 at 8:17 PM, Andrew Dalke da...@dalkescientific.com wrote: In this email I propose a few changes which I think are minor and which don't really affect the external NumPy API but which I think could improve the import numpy performance by at least 40%. This affects me because I and my clients use a chemistry toolkit which uses only NumPy arrays, and where we run short programs often on the command-line. In July of 2008 I started a thread about how import numpy was noticeably slow for one of my customers. They had chemical analysis software, often even run on a single molecular structure using command-line tools, and the several invocations with 0.1 seconds overhead was one of the dominant costs even when numpy wasn't needed. I fixed most of their problems by deferring numpy imports until needed. I remember well the Steve Jobs anecdote at http://folklore.org/StoryView.py?project=Macintoshstory=Saving_Lives.txt and spent another day of my time in 2008 to identify the parts of the numpy import sequence which seemed excessive. I managed to get the import time down from 0.21 seconds to 0.08 seconds. I will answer to your other remarks later, but 0.21 sec to import numpy is very slow, especially on a recent computer. It is 0.095 sec on my mac, and 0.075 sec on a linux VM on the same computer (both hot cache of course). importing multiarray.so only is negligible for me (i.e. difference between python -c import multiarray and python -c is statistically insignificant). I would check external factors, like the size of your sys.path as well. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] import numpy performance
On Mon, Jul 2, 2012 at 11:15 PM, Andrew Dalke da...@dalkescientific.com wrote: On Jul 2, 2012, at 11:38 PM, Fernando Perez wrote: No, that's the wrong thing to test, because it effectively amounts to 'import numpy', sicne the numpy __init__ file is still executed. As David indicated, you must import multarray.so by itself. I understand that clarification. However, it does not affect me. It is indeed irrelevant to your end goal, but it does affect the interpretation of what import_array does, and thus of your benchmark polynomial is definitely the big new overhead (I don't remember it being significant last time I optimized numpy import times), it is roughly 30 % of the total cost of importing numpy (95 - 70 ms total time, of which numpy went from 70 to 50 ms). Then ctypeslib and test are the two other significant ones. I use profile_imports.py from bzr as follows: import sys import profile_imports profile_imports.install() import numpy profile_imports.log_stack_info(sys.stdout) Focusing on polynomial seems the only sensible action. Except for test, all the other stuff seem difficult to change without breaking anything. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Combined versus separate build
On Mon, Jul 2, 2012 at 11:34 PM, Nathaniel Smith n...@pobox.com wrote: To be clear, this subthread started with the caveat *as far as our officially supported platforms go* -- I'm not saying that we should go around and remove all the NPY_NO_EXPORT macros tomorrow. However, the only reason they're actually needed is for supporting platforms where you can't control symbol visibility from the linker, and AFAICT we have no examples of such platforms to hand. I gave you one, mingw 3.x. Actually, reading a bit more around, it seems this is not specific to mingw, but all gcc 4 (http://gcc.gnu.org/gcc-4.0/changes.html#visibility) I don't have windows to test, but everyone else on the internet seems to think mingw works the way I said, with __declspec and all... you aren't thinking of cygwin, are you? (see e.g. http://mingw.org/wiki/sampleDLL) Well, I did check myself, but looking more into it, I was tricked by nm output, which makes little sense on windows w.r.t. visibility with dll. You can define the same function in multiple dll, they will all appear as a public symbol (T label with nm), but the windows linker will not see them when linking for an executable. I am still biased toward the conservative option, especially that it is still followed by pretty much every C extension out there (including python itself). I trust their experience in dealing with cross platform more than ours. I cannot find my patch for detecting platforms where this can safely become the default, I will reprepare one. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Code Freeze for NumPy 1.7
On Sun, Jul 15, 2012 at 5:42 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Jul 15, 2012 at 10:32 AM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sun, Jul 15, 2012 at 5:57 PM, Nathaniel Smith n...@pobox.com wrote: On Sun, Jul 15, 2012 at 1:08 PM, Ralf Gommers ralf.gomm...@googlemail.com wrote: On Sun, Jul 15, 2012 at 12:45 AM, Travis Oliphant tra...@continuum.io wrote: Hey all, We are nearing a code-freeze for NumPy 1.7. Are there any last-minute changes people are wanting to push into NumPy 1.7? We should discuss them as soon as possible. I'm proposing a code-freeze at midnight UTC on July 18th (7:00pm CDT on July 17th). This will allow the creation of beta releases of NumPy on the 18th of July. This is a few days later than originally hoped for --- largely due to unexpected travel schedules of Ondrej and I, but it does give people a few more days to get patches in. Of course, we will be able to apply bug-fixes to the 1.7.x branch once the tag is made. What about the tickets still open for 1.7.0 (http://projects.scipy.org/numpy/report/3)? There are a few important ones left. These I would consider blockers: - #2108 Datetime failures with MinGW Is there a description anywhere of what the problem actually is here? I looked at the ticket, which referred to a PR, and it's hard to work out from the PR discussion what the actual remaining test failures are -- and there definitely doesn't seem to be any description of the underlying problem. (Something about working 64-bit time_t on windows being difficult depending on the compiler used?) There's a lot more discussion on http://projects.scipy.org/numpy/ticket/1909 https://github.com/numpy/numpy/pull/156 https://github.com/numpy/numpy/pull/161. The issue is that for MinGW 3.x some _s / _t functions seem to be missing. And we don't yet support MinGW 4.x. Current issues can be seen from the last test log on our Windows XP buildbot (June 29, http://buildbot.scipy.org/builders/Windows_XP_x86/builds/1124/steps/shell_1/logs/stdio): == ERROR: test_datetime_arange (test_datetime.TestDateTime) -- Traceback (most recent call last): File C:\buildbot\numpy\b11\numpy-install\Lib\site-packages\numpy\core\tests\test_datetime.py, line 1351, in test_datetime_arange assert_raises(ValueError, np.arange, np.datetime64('today'), OSError: Failed to use '_localtime64_s' to convert to a local time == ERROR: test_datetime_y2038 (test_datetime.TestDateTime) -- Traceback (most recent call last): File C:\buildbot\numpy\b11\numpy-install\Lib\site-packages\numpy\core\tests\test_datetime.py, line 1706, in test_datetime_y2038 a = np.datetime64('2038-01-20T13:21:14') OSError: Failed to use '_gmtime64_s' to convert to a UTC time == ERROR: test_pydatetime_creation (test_datetime.TestDateTime) -- Traceback (most recent call last): File C:\buildbot\numpy\b11\numpy-install\Lib\site-packages\numpy\core\tests\test_datetime.py, line 467, in test_pydatetime_creation a = np.array(['today', datetime.date.today()], dtype='M8[D]') OSError: Failed to use '_localtime64_s' to convert to a local time == ERROR: test_string_parser_variants (test_datetime.TestDateTime) -- Traceback (most recent call last): File C:\buildbot\numpy\b11\numpy-install\Lib\site-packages\numpy\core\tests\test_datetime.py, line 1054, in test_string_parser_variants assert_equal(np.array(['1980-02-29T01:02:03'], np.dtype('M8[s]')), OSError: Failed to use '_gmtime64_s' to convert to a UTC time == ERROR: test_timedelta_scalar_construction_units (test_datetime.TestDateTime) -- Traceback (most recent call last): File C:\buildbot\numpy\b11\numpy-install\Lib\site-packages\numpy\core\tests\test_datetime.py, line 287, in test_timedelta_scalar_construction_units assert_equal(np.datetime64('2010-03-12T17').dtype, OSError: Failed to use '_gmtime64_s' to convert to a UTC time == ERROR: Failure: OSError (Failed to use '_gmtime64_s' to convert to a UTC time) -- Traceback (most recent call last): File
Re: [Numpy-discussion] Lazy imports again
On Mon, Jul 16, 2012 at 5:28 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, Working lazy imports would be useful to have. Ralf is opposed to the idea because it caused all sorts of problems on different platforms when it was tried in scipy. I thought I'd open the topic for discussion so that folks who had various problems/solutions could offer input and the common experience could be collected in one place. Perhaps there is a solution that actually works. I have never seen a lazy import system that did not cause issues in one way or the other. Lazy imports make a lot of sense for an application (e.g. mercurial), but I think it is a mistake to solve this at the numpy level. This should be solved at the application level, and there are solutions for that. For example, using the demandimport code from mercurial (GPL) cuts down the numpy import time by 3 on my mac if one uses np.zeros (100ms - 50 ms, of which 25 are taken by python itself): import demandimport demandimport.enable() import numpy as np a = np.zeros(10) To help people who need fast numpy imports, I would suggest the following course of actions: - start benchmarking numpy import in a per-commit manner to detect significant regressions (like what happens with polynomial code) - have a small FAQ on it, with suggestion for people who need to optimize their short-lived script cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Lazy imports again
On Tue, Jul 17, 2012 at 1:13 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Jul 17, 2012 at 1:31 AM, David Cournapeau courn...@gmail.com wrote: On Mon, Jul 16, 2012 at 5:28 PM, Charles R Harris charlesr.har...@gmail.com wrote: Hi All, Working lazy imports would be useful to have. Ralf is opposed to the idea because it caused all sorts of problems on different platforms when it was tried in scipy. I thought I'd open the topic for discussion so that folks who had various problems/solutions could offer input and the common experience could be collected in one place. Perhaps there is a solution that actually works. I have never seen a lazy import system that did not cause issues in one way or the other. Lazy imports make a lot of sense for an application (e.g. mercurial), but I think it is a mistake to solve this at the numpy level. This should be solved at the application level, and there are solutions for that. For example, using the demandimport code from mercurial (GPL) cuts down the numpy import time by 3 on my mac if one uses np.zeros (100ms - 50 ms, of which 25 are taken by python itself): import demandimport demandimport.enable() import numpy as np a = np.zeros(10) To help people who need fast numpy imports, I would suggest the following course of actions: - start benchmarking numpy import in a per-commit manner to detect significant regressions (like what happens with polynomial code) - have a small FAQ on it, with suggestion for people who need to optimize their short-lived script That's really interesting. I'd like to see some folks try that solution. Anyone can:) the file is self-contained last time I checked: http://www.selenic.com/hg/file/67b8cca2f12b/mercurial/demandimport.py cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Symbol table not found compiling numpy from git repository on Windows
On Wed, Jul 18, 2012 at 11:38 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote: On Wed, Jul 18, 2012 at 12:30 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote: On Wed, Jul 18, 2012 at 2:20 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote: On Thu, Jan 5, 2012 at 8:22 PM, John Salvatier jsalv...@u.washington.edu wrote: Hello, I'm trying to compile numpy on Windows 7 using the command: python setup.py config --compiler=mingw32 build but I get an error about a symbol table not found. Anyone know how to work around this or what to look into? building library npymath sources Building msvcr library: C:\Python26\libs\libmsvcr90.a (from C:\Windows\winsxs\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.21022.8_none_750b37ff97f4f68b\msvcr90.dll) objdump.exe: C:\Windows\winsxs\amd64_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.21022.8_none_750b37ff97f4f68b\msvcr90.dll: File format not recognized Traceback (most recent call last): File setup.py, line 214, in module setup_package() File setup.py, line 207, in setup_package configuration=configuration ) File C:\Users\jsalvatier\workspace\numpy\numpy\distutils\core.py, line 186, in setup return old_setup(**new_attr) File C:\Python26\lib\distutils\core.py, line 152, in setup dist.run_commands() File C:\Python26\lib\distutils\dist.py, line 975, in run_commands self.run_command(cmd) File C:\Python26\lib\distutils\dist.py, line 995, in run_command cmd_obj.run() File C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build.py, line 37, in run old_build.run(self) File C:\Python26\lib\distutils\command\build.py, line 134, in run self.run_command(cmd_name) File C:\Python26\lib\distutils\cmd.py, line 333, in run_command self.distribution.run_command(command) File C:\Python26\lib\distutils\dist.py, line 995, in run_command cmd_obj.run() File C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py, line 152, in run self.build_sources() File C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py, line 163, in build_sources self.build_library_sources(*libname_info) File C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py, line 298, in build_library_sources sources = self.generate_sources(sources, (lib_name, build_info)) File C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\build_src.py, line 385, in generate_sources source = func(extension, build_dir) File numpy\core\setup.py, line 646, in get_mathlib_info st = config_cmd.try_link('int main(void) { return 0;}') File C:\Python26\lib\distutils\command\config.py, line 257, in try_link self._check_compiler() File C:\Users\jsalvatier\workspace\numpy\numpy\distutils\command\config.py, line 45, in _check_compiler old_config._check_compiler(self) File C:\Python26\lib\distutils\command\config.py, line 107, in _check_compiler dry_run=self.dry_run, force=1) File C:\Users\jsalvatier\workspace\numpy\numpy\distutils\ccompiler.py, line 560, in new_compiler compiler = klass(None, dry_run, force) File C:\Users\jsalvatier\workspace\numpy\numpy\distutils\mingw32ccompiler.py, line 94, in __init__ msvcr_success = build_msvcr_library() File C:\Users\jsalvatier\workspace\numpy\numpy\distutils\mingw32ccompiler.py, line 362, in build_msvcr_library generate_def(dll_file, def_file) File C:\Users\jsalvatier\workspace\numpy\numpy\distutils\mingw32ccompiler.py, line 282, in generate_def raise ValueError(Symbol table not found) ValueError: Symbol table not found Did you find a workaround? I am having exactly the same problem. So this happens both in Windows and in Wine and the problem is that the numpy distutils is trying to read the symbol table using objdump from msvcr90.dll but it can't recognize the format: objdump.exe: C:\windows\winsxs\x86_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.4148_none_deadbeef\msvcr90.dll: File format not recognized The file exists: $ file ~/.wine/drive_c/windows/winsxs/x86_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.4148_none_deadbeef/msvcr90.dll /home/ondrej/.wine/drive_c/windows/winsxs/x86_microsoft.vc90.crt_1fc8b3b9a1e18e3b_9.0.30729.4148_none_deadbeef/msvcr90.dll: PE32 executable for MS Windows (DLL) (unknown subsystem) Intel 80386 32-bit But objdump doesn't work on it. So the following patch fixes it: diff --git a/numpy/distutils/mingw32ccompiler.py b/numpy/distutils/mingw32ccompi index 5b9aa33..72ff5ed 100644 --- a/numpy/distutils/mingw32ccompiler.py +++ b/numpy/distutils/mingw32ccompiler.py @@ -91,11 +91,11 @@ class Mingw32CCompiler(distutils.cygwinccompiler.CygwinCComp build_import_library() # Check for custom msvc runtime library on Windows. Build if it doesn't -msvcr_success = build_msvcr_library() -msvcr_dbg_success = build_msvcr_library(debug=True) -
Re: [Numpy-discussion] Segfault in mingw in test_arrayprint.TestComplexArray
On Fri, Jul 20, 2012 at 12:24 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote: So I have tried the MinGW-5.0.3.exe in Wine, but it tries to install from some wrong url and it fails to install. I have unpacked the tarballs by hand into ~/.wine/drive_c/MinGW: Not surprising, that MinGW is really getting old. It's still the last available one with gcc 3.x as IIRC. To make things reproducible, I've put all my packages in this repository: https://github.com/certik/numpy-vendor binutils-2.17.50-20070129-1.tar.gz w32api-3.7.tar.gz gcc-g77-3.4.5-20051220-1.tar.gz gcc-g++-3.4.5-20051220-1.tar.gz gcc-core-3.4.5-20051220-1.tar.gz mingw-runtime-3.10.tar.gz also in the same directory, I had to do: cp ../windows/system32/msvcr90.dll lib/ Looks like I have an older Wine, not sure if it makes a difference: $ locate msvcr90.dll /Users/rgommers/.wine/drive_c/windows/winsxs/x86_Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0.21022.8_x-ww_d08d0375/msvcr90.dll /Users/rgommers/__wine/drive_c/windows/winsxs/x86_Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0.21022.8_x-ww_d08d0375/msvcr90.dll $ locate msvcr71.dll /Users/rgommers/.wine/drive_c/windows/system32/msvcr71.dll /Users/rgommers/Code/wine/dlls/msvcr71/msvcr71.dll.fake /Users/rgommers/Code/wine/dlls/msvcr71/msvcr71.dll.so /Users/rgommers/__wine/drive_c/windows/system32/msvcr71.dll /Users/rgommers/wine/build/wine-1.1.39/dlls/msvcr71/msvcr71.dll.fake /Users/rgommers/wine/build/wine-1.1.39/dlls/msvcr71/msvcr71.dll.so /Users/rgommers/wine/wine-1.1.39/lib/wine/fakedlls/msvcr71.dll /Users/rgommers/wine/wine-1.1.39/lib/wine/msvcr71.dll.so /usr/local/lib/wine/fakedlls/msvcr71.dll /usr/local/lib/wine/msvcr71.dll.so Actually, I made a mistake --- the one in drive_c/windows/system32/msvcr90.dll does not work for me. The one I use is installed by the Python installer (as I found out) and it is in: drive_c/windows/winsxs/x86_Microsoft.VC90.CRT_1fc8b3b9a1e18e3b_9.0.21022.8_x-ww_d08d0375/msvcr90.dll Which seems to be the same as the one that you use. Just in case, I've put it here: https://github.com/certik/numpy-vendor/blob/master/msvcr90.dll Also I've added the bin directory to PATH using the following trick: $ cat tmp EOF REGEDIT4 [HKEY_CURRENT_USER\Environment] PATH=C:MinGWbin EOF $ wine regedit tmp Then I build and installed numpy using: wine C:\Python27\python setup.py build --compiler=mingw32 install And now there is no segfault when constructing a complex array! So newer (newest) mingw miscompiles NumPy somehow... Anyway, running tests, it gets much farther then before, now it hangs at: test_multiarray.TestIO.test_ascii ... err:ntdll:RtlpWaitForCriticalSection section 0x785b7428 ? wait timed out in thread 0009, blocked by , retrying (60 sec) fixme:keyboard:X11DRV_ActivateKeyboardLayout 0x4090409, : semi-stub! err:ntdll:RtlpWaitForCriticalSection section 0x785b7428 ? wait timed out in thread 0009, blocked by , retrying (60 sec) err:ntdll:RtlpWaitForCriticalSection section 0x785b7428 ? wait timed out in thread 0009, blocked by , retrying (60 sec) ... Not sure what this problem is yet. This however is a big problem. I've tested it on the actual Windows 64bit XP box, and the test simply segfaults at this place. Ralf, I should note, that your latest scipy RC tests also segfault on my Windows machine, so maybe something is wrong with the machine... I have some good news for numpy, but bad news for you :) - first, building numpy and testing mostly work for me (tried the last commit from 1.7.x branch) with mingw 5.0.4 with python 2.7.3 and *without* any change in the code (i.e. I did not commented out the part to build msgcr90 import library). - I don't know what the issue is in your environment for msvc90, but I can confirm that it is required. gcc 3.x which was built around 2005/2006 cannot possibly provide the import library for msvcr90, and the build works ok - I strongly suspect some issues because you started with mingw / gcc 4.x. If you moved some libraries in system directories, I suggest you start fresh from a clean state in your VM (or rm -rf .wine :) ). I noticed that when VS 2008 is available, distutils does the configuration with MS compilers, which is broken. I will test later on a machine wo vs 2008. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Segfault in mingw in test_arrayprint.TestComplexArray
On Thu, Jul 19, 2012 at 4:58 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote: So I have tried the MinGW-5.0.3.exe in Wine, but it tries to install from some wrong url and it fails to install. I have unpacked the tarballs by hand into ~/.wine/drive_c/MinGW: binutils-2.17.50-20070129-1.tar.gz w32api-3.7.tar.gz gcc-g77-3.4.5-20051220-1.tar.gz gcc-g++-3.4.5-20051220-1.tar.gz gcc-core-3.4.5-20051220-1.tar.gz mingw-runtime-3.10.tar.gz also in the same directory, I had to do: cp ../windows/system32/msvcr90.dll lib/ I think that's your problem right there. You should not need to do that, and doing so will likely result in having multiple copies of the DLL in your process (you can confirm with process dependency walker). This should be avoided at all cost, as the python C API is not designed to deal with this, and your crashes are pretty typical of what happens in those cases. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Status of NumPy and Python 3.3
On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant tra...@continuum.io wrote: Hey all, I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now. It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience. Given that we're late with 1.7, I would suggest passing this to the next release, unless the fix is simple (just a change of API). cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Status of NumPy and Python 3.3
On Fri, Jul 27, 2012 at 9:28 AM, David Cournapeau courn...@gmail.com wrote: On Fri, Jul 27, 2012 at 7:30 AM, Travis Oliphant tra...@continuum.io wrote: Hey all, I'm wondering who has tried to make NumPy work with Python 3.3. The Unicode handling was significantly improved in Python 3.3 and the array-scalar code (which assumed a certain structure for UnicodeObjects) is not working now. It would be nice to get 1.7.0 working with Python 3.3 if possible before the release. Anyone interested in tackling that little challenge? If someone has already tried it would be nice to hear your experience. Given that we're late with 1.7, I would suggest passing this to the next release, unless the fix is simple (just a change of API). I took a brief look at it, and from the errors I have seen, one is cosmetic, the other one is a bit more involved (rewriting PyArray_Scalar unicode support). While it is not difficult in nature, the current code has multiple #ifdef of Py_UNICODE_WIDE, meaning it would require multiple configurations on multiple python versions to be tested. I don't think python 3.3 support is critical - people who want to play with bet interpreters can build numpy by themselves from master, so I am -1 on integrating this into 1.7. I may have a fix within tonight for it, though, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Moving away from using accelerate framework on mac os x ?
Hi, During last PyCon, Olivier Grisel (from scikits-learn fame) and myself looked into a nasty bug on mac os x: https://gist.github.com/2027412. The short story is that I believe this means numpy cannot be used with multiprocessing if linked against accelerate framework, and as such we should think about giving up on accelerate, and use e.g. ATLAS on mac for our official binaries. Long story: we recently received a answer where the engineers mention that using blas on each 'side' of a fork is not supported. The meat of the email is attached below thoughts ? David -- Forwarded message -- From: devb...@apple.com Date: 2012/8/2 Subject: Bug ID 11036478: Segfault when calling dgemm with Accelerate / GCD after in a forked process To: olivier.gri...@gmail.com Hi Olivier, Thank you for contacting us regarding Bug ID# 11036478. Thank you for filing this bug report. This usage of fork() is not supported on our platform. For API outside of POSIX, including GCD and technologies like Accelerate, we do not support usage on both sides of a fork(). For this reason among others, use of fork() without exec is discouraged in general in processes that use layers above POSIX. We recommend that you either restrict usage of blas to the parent or the child process but not both, or that you switch to using GCD or pthreads rather than forking to create parallelism. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Moving away from using accelerate framework on mac os x ?
On Sat, Aug 4, 2012 at 12:14 PM, Aron Ahmadia a...@ahmadia.net wrote: Hi David, Apple's response here is somewhat confusing, but I will add that on the supercomputing side of things we rarely fork, as this is not well-supported from the vendors or the hardware (it's hard enough to performantly spawn 500,000 processes statically, doing this dynamically becomes even more challenging). This sounds like an issue in Python multiprocessing itself, as I guess many other Apple libraries will fail or crash with the fork-no-exec model. My suggestion would be that numpy continue to integrate with Accelerate but prefer a macports or brew supplied blas, if available. This should probably also be filed as a wont-fix bug on the tracker so anybody who hits the same problem knows that it's on the system side and not us. To be clear, I am not suggesting removing support for linking against accelerate, just to go away from it for our binary releases. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Unicode revisited
On Sat, Aug 4, 2012 at 12:58 PM, Stefan Krah stefan-use...@bytereef.org wrote: Nathaniel Smith n...@pobox.com wrote: On Sat, Aug 4, 2012 at 11:42 AM, Stefan Krah stefan-use...@bytereef.org wrote: switch (descr-byteorder) { case '': byteorder = -1; case '': byteorder = 1; default: /* '=', '|' */ byteorder = 0; } I think you might want some breaks in here... Indeed. Shame on me for posting quick-and-dirty code. Maybe we should unit-testing our email too :) David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] building numpy 1.6.2 on OSX 10.6 / Python2.7.3
On Wed, Aug 8, 2012 at 6:15 AM, Andrew Nelson andyf...@gmail.com wrote: Dear Pierre, as indicated yesterday OSX system python is in: /System/Library/Frameworks/Python.framework/ I am installing into: /Library/Frameworks/Python.framework/Versions/Current/lib/python2.7/site-packages This should not present a problem and does not explain why numpy does not build/import correctly on my setup. Please give us the build log (when rebuilding from scratch to have the complete log) so that we can have a better idea of the issue, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Licensing question
On Wed, Aug 8, 2012 at 12:55 AM, Nathaniel Smith n...@pobox.com wrote: On Mon, Aug 6, 2012 at 8:31 PM, Robert Kern robert.k...@gmail.com wrote: Those are not the original Fortran sources. The original Fortran sources are in the public domain as work done by a US federal employee. http://www.netlib.org/fftpack/ Never trust the license of any code on John Burkardt's site. Track it down to the original sources. Taken together, what those websites seem to be claiming is that you have a choice of buggy BSD code or fixed GPL code? I assume someone has already taken the appropriate measures for numpy, but it seems like an unfortunate situation... If the code on John Burkardt website is based on the netlib codebase, he is not entitled to make it GPL unless he is the sole copyright holder of the original code. I think the 'real' solution is to have a separate package linking to FFTW for people with 'advanced' needs for FFT. None of the other library I have looked at so far are usable, fast and precise enough when you go far from the simple case of double precision and 'well factored' size. regards, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Licensing question
On Wed, Aug 8, 2012 at 10:53 AM, Robert Kern robert.k...@gmail.com wrote: On Wed, Aug 8, 2012 at 10:34 AM, David Cournapeau courn...@gmail.com wrote: On Wed, Aug 8, 2012 at 12:55 AM, Nathaniel Smith n...@pobox.com wrote: On Mon, Aug 6, 2012 at 8:31 PM, Robert Kern robert.k...@gmail.com wrote: Those are not the original Fortran sources. The original Fortran sources are in the public domain as work done by a US federal employee. http://www.netlib.org/fftpack/ Never trust the license of any code on John Burkardt's site. Track it down to the original sources. Taken together, what those websites seem to be claiming is that you have a choice of buggy BSD code or fixed GPL code? I assume someone has already taken the appropriate measures for numpy, but it seems like an unfortunate situation... If the code on John Burkardt website is based on the netlib codebase, he is not entitled to make it GPL unless he is the sole copyright holder of the original code. He can certainly incorporate the public domain code and rerelease it under whatever restrictions he likes, especially if he adds to it, which appears to be the case. The original sources are legitimately public domain, not just released under a liberal copyright license. He can't remove the original code from the public domain, but that's not what he claims to have done. I think the 'real' solution is to have a separate package linking to FFTW for people with 'advanced' needs for FFT. None of the other library I have looked at so far are usable, fast and precise enough when you go far from the simple case of double precision and 'well factored' size. http://pypi.python.org/pypi/pyFFTW Nice, I am starting to get out of touch with too many packages... Would be nice to add DCT and DST support to it. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Vagrant VM for building NumPy (1.7.x) Windows binaries
Hi Ondrej, On Mon, Aug 13, 2012 at 5:13 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote: Hi, I've created this repository: https://github.com/certik/numpy-vendor which uses Vagrant and Fabric to fully automate the setup creation of NumPy binaries for Windows. The setup is especially tricky, I've thought several times already that I nailed it, and then always new things pop up. One can of course install things directly in Ubuntu, but it's tricky, there are a lot of things that can go wrong. The above approach should be 100% reproducible. So hopefully this repository will be useful for somebody new (like I am) to numpy releases. Also my hope is that more people can help out with the release just by running it on their machines and/or sending PRs against this repository. Thanks for doing this. I think vagrant is the way to go. I myself have some stuff for native windows and vagrant (much more painful, but sometimes necessary unfortunately). Did you see veewee to create vagrant boxes ? It simplifies quite a few things, but maybe they matter more on windows than on linux, where this kind of things is much simpler. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to use numpy-vendor
Hi Ondrej, On Tue, Aug 14, 2012 at 5:34 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote: Hi, How should one use the vendor repository (https://github.com/numpy/vendor) in Wine? Should I put the binaries into .wine/drive_c/Python25/libs/, or somewhere else? I've search all mailinglists and I didn't find any information on it. I vaguely remember that somebody mentioned it somewhere, but I am not able to find it. Once I understand it, I'll send a PR updating the README. There is no information on vendor: that's a repo I set up to avoid polluting the main repo with all the binary stuff that used to be in SVN. The principle is to put binaries used to *build* numpy, but we don't put anything there for end-users. What binaries do you need to put there ? Numpy binaries are usually put on sourceforge (although I would be more than happy to have a suggestion for a better way because uploading on sourceforge is the very definition of pain). David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to use numpy-vendor
On Tue, Aug 14, 2012 at 11:22 AM, Nathaniel Smith n...@pobox.com wrote: On Tue, Aug 14, 2012 at 11:06 AM, David Cournapeau courn...@gmail.com wrote: Hi Ondrej, On Tue, Aug 14, 2012 at 5:34 AM, Ondřej Čertík ondrej.cer...@gmail.com wrote: Hi, How should one use the vendor repository (https://github.com/numpy/vendor) in Wine? Should I put the binaries into .wine/drive_c/Python25/libs/, or somewhere else? I've search all mailinglists and I didn't find any information on it. I vaguely remember that somebody mentioned it somewhere, but I am not able to find it. Once I understand it, I'll send a PR updating the README. There is no information on vendor: that's a repo I set up to avoid polluting the main repo with all the binary stuff that used to be in SVN. The principle is to put binaries used to *build* numpy, but we don't put anything there for end-users. What binaries do you need to put there ? Numpy binaries are usually put on sourceforge (although I would be more than happy to have a suggestion for a better way because uploading on sourceforge is the very definition of pain). I think he's asking how to use the binaries in numpy-vendor to build a release version of numpy. Hm, good point, I don't know why I read putting .wine stuff into vendor instead of the opposite. Anyway, the way to use the binaries is to put them in some known location, e.g. C:\local ($WINEPREFIX/drive_c/local for wine), and copy the nosse/sse2/sse3 directories in there. For example: C:\local\lib\yop\nosse C:\local\lib\yop\sse2 ... This is then referred through env by the pavement script (see https://github.com/numpy/numpy/blob/master/pavement.py#L143). Renaming yop to atlas would be a good idea, don't know why I let that non-descriptive name in there. Manually, you can just do something like ATLAS=C:\local\lib\yop\sse2 python setup.py build, but being careful about how env variables are passed between shell and wine (don't remember the details). Note that the nosse is not ATLAS, but straight netlib libs, which is why in that case you need to use BLAS=... LAPACK=... I would strongly suggest not to use openblas for this release, because of all the issues related to CPU tuning. We could certainly update a bit what we have in there, but building windows binaries is big enough of a pain, that you don't want to do everything at once I think, especially testing/building blas on windows is very time consuming. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Preventing lossy cast for new float dtypes ?
Hi, I have started toying with implementing a quad precision dtype for numpy on supported platforms, using the __float128 + quadmath lib from gcc. I have noticed invalid (and unexpected) downcast to long double in some cases, especially for ufuncs (e.g. when I don't define my own ufunc for a given operation). Looking down in numpy ufunc machinery, I can see that the issue is coming from the assumption that long double is the highest precision possible for a float type, and the only way I can 'fix' this is to define kind to a value != 'f' in my dtype definition (in which case I get an expected invalid cast exception). Is there a way to still avoid those casts while keeping the 'f' kind ? thanks, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] 64bit infrastructure
On Tue, Aug 21, 2012 at 12:15 AM, Chris Barker chris.bar...@noaa.gov wrote: On Mon, Aug 20, 2012 at 3:51 PM, Travis Oliphant tra...@continuum.io wrote: I'm actually not sure, why. I think the issue is making sure that the release manager can actually build NumPy without having to buy a particular compiler. The MS Express editions, while not open source, are free-to-use, and work fine. Not sure what what do about Fortran, though, but that's a scipy, not a numpy issue, yes? fortran is the issue. Having one or two licenses of say Intel Fortran compiler is not enough because it makes it difficult for people to build on top of scipy. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion