On Fri, Feb 17, 2012 at 3:38 PM, Ralf Gommers <ralf.gomm...@googlemail.com>wrote:
> > > On Fri, Feb 17, 2012 at 8:31 PM, Mark Wiebe <mwwi...@gmail.com> wrote: > >> On Fri, Feb 17, 2012 at 11:00 AM, Christopher Jordan-Squire < >> cjord...@uw.edu> wrote: >> >>> On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe <mwwi...@gmail.com> wrote: >>> > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing <efir...@hawaii.edu> >>> wrote: >>> >> >>> >> On 02/17/2012 05:39 AM, Charles R Harris wrote: >>> >> > >>> >> > >>> >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau < >>> courn...@gmail.com >>> >> > <mailto:courn...@gmail.com>> wrote: >>> >> > >>> >> > Hi Travis, >>> >> > >>> >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant >>> >> > <tra...@continuum.io <mailto:tra...@continuum.io>> wrote: >>> >> > > Mark Wiebe and I have been discussing off and on (as well as >>> >> > talking with Charles) a good way forward to balance two >>> competing >>> >> > desires: >>> >> > > >>> >> > > * addition of new features that are needed in NumPy >>> >> > > * improving the code-base generally and moving >>> towards a >>> >> > more maintainable NumPy >>> >> > > >>> >> > > I know there are load voices for just focusing on the second >>> of >>> >> > these and avoiding the first until we have finished that. I >>> >> > recognize the need to improve the code base, but I will also be >>> >> > pushing for improvements to the feature-set and user experience >>> in >>> >> > the process. >>> >> > > >>> >> > > As a result, I am proposing a rough outline for releases >>> over the >>> >> > next year: >>> >> > > >>> >> > > * NumPy 1.7 to come out as soon as the serious bugs >>> can be >>> >> > eliminated. Bryan, Francesc, Mark, and I are able to help >>> triage >>> >> > some of those. >>> >> > > >>> >> > > * NumPy 1.8 to come out in July which will have as >>> many >>> >> > ABI-compatible feature enhancements as we can add while >>> improving >>> >> > test coverage and code cleanup. I will post to this list more >>> >> > details of what we plan to address with it later. Included >>> for >>> >> > possible inclusion are: >>> >> > > * resolving the NA/missing-data issues >>> >> > > * finishing group-by >>> >> > > * incorporating the start of label arrays >>> >> > > * incorporating a meta-object >>> >> > > * a few new dtypes (variable-length string, >>> >> > varialbe-length unicode and an enum type) >>> >> > > * adding ufunc support for flexible dtypes and >>> possibly >>> >> > structured arrays >>> >> > > * allowing generalized ufuncs to work on more kinds of >>> >> > arrays besides just contiguous >>> >> > > * improving the ability for NumPy to receive >>> JIT-generated >>> >> > function pointers for ufuncs and other calculation opportunities >>> >> > > * adding "filters" to Input and Output >>> >> > > * simple computed fields for dtypes >>> >> > > * accepting a Data-Type specification as a class or >>> JSON >>> >> > file >>> >> > > * work towards improving the dtype-addition mechanism >>> >> > > * re-factoring of code so that it can compile with a >>> C++ >>> >> > compiler and be minimally dependent on Python data-structures. >>> >> > >>> >> > This is a pretty exciting list of features. What is the >>> rationale >>> >> > for >>> >> > code being compiled as C++ ? IMO, it will be difficult to do so >>> >> > without preventing useful C constructs, and without removing >>> some of >>> >> > the existing features (like our use of C99 complex). The subset >>> that >>> >> > is both C and C++ compatible is quite constraining. >>> >> > >>> >> > >>> >> > I'm in favor of this myself, C++ would allow a lot code cleanup and >>> make >>> >> > it easier to provide an extensible base, I think it would be a >>> natural >>> >> > fit with numpy. Of course, some C++ projects become tangled messes >>> of >>> >> > inheritance, but I'd be very interested in seeing what a good C++ >>> >> > designer like Mark, intimately familiar with the numpy code base, >>> could >>> >> > do. This opportunity might not come by again anytime soon and I >>> think we >>> >> > should grab onto it. The initial step would be a release whose code >>> that >>> >> > would compile in both C/C++, which mostly comes down to removing C++ >>> >> > keywords like 'new'. >>> >> > >>> >> > I did suggest running it by you for build issues, so please raise >>> any >>> >> > you can think of. Note that MatPlotLib is in C++, so I don't think >>> the >>> >> > problems are insurmountable. And choosing a set of compilers to >>> support >>> >> > is something that will need to be done. >>> >> >>> >> It's true that matplotlib relies heavily on C++, both via the Agg >>> >> library and in its own extension code. Personally, I don't like >>> this; I >>> >> think it raises the barrier to contributing. C++ is an order of >>> >> magnitude more complicated than C--harder to read, and much harder to >>> >> write, unless one is a true expert. In mpl it brings reliance on the >>> CXX >>> >> library, which Mike D. has had to help maintain. And if it does >>> >> increase compiler specificity, that's bad. >>> > >>> > >>> > This gets to the recruitment issue, which is one of the most important >>> > problems I see numpy facing. I personally have contributed a lot of >>> code to >>> > NumPy *in spite of* the fact it's in C. NumPy being in C instead of >>> C++ was >>> > the biggest negative point when I considered whether it was worth >>> > contributing to the project. I suspect there are many programmers out >>> there >>> > who are skilled in low-level, high-performance C++, who would be >>> willing to >>> > contribute, but don't want to code in C. >>> > >>> > I believe NumPy should be trying to find people who want to make high >>> > performance, close to the metal, libraries. This is a very different >>> type of >>> > programmer than one who wants to program in Python, but is willing to >>> dabble >>> > in a lower level language to make something run faster. High >>> performance >>> > library development is one of the things the C++ developer community >>> does >>> > very well, and that community is where we have a good chance of >>> finding the >>> > programmers NumPy needs. >>> > >>> >> I would much rather see development in the direction of sticking with >>> C >>> >> where direct low-level control and speed are needed, and using cython >>> to >>> >> gain higher level language benefits where appropriate. Of course, >>> that >>> >> brings in the danger of reliance on another complex tool, cython. If >>> >> that danger is considered excessive, then just stick with C. >>> > >>> > >>> > There are many small benefits C++ can offer, even if numpy chooses >>> only to >>> > use a tiny subset of the C++ language. For example, RAII can be used to >>> > reliably eliminate PyObject reference leaks. >>> > >>> > Consider a regression like this: >>> > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html >>> > >>> > Fixing this in C would require switching all the relevant usages of >>> > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the >>> > potential of easily introducing a memory leak, and is a lot of work to >>> do. >>> > In C++, this functionality could be placed inside a class, where the >>> > deterministic construction/destruction semantics eliminate the risk of >>> > memory leaks and make the code easier to read at the same time. There >>> are >>> > other examples like this where the C language has forced a suboptimal >>> design >>> > choice because of how hard it would be to do it better. >>> > >>> > Cheers, >>> > Mark >>> > >>> >>> In a similar vein, could incorporating C++ lead to a simpler low-level >>> API for numpy? >> >> >> This could definitely happen. One way to do it is to have a stable C API >> which remains fixed over many releases, and a C++ library which is allowed >> to change significantly at each release. This is what the LLVM project >> does, for example. OpenCV is an example of another project which was >> previously just C, but now has an extensive C++ API. >> >> >>> I know Mark has talked before about--in the long-term, >>> as a dream project to scratch his own itch, and something the BDF12 >>> doesn't necessarily agree with--implementing the great ideas in numpy >>> as a layered C++ library. (Which would have the added benefit of >>> making numpy more of a general array library that could be exposed to >>> any language which can call C++ libraries.) >>> >>> I don't imagine that's on the table for anything near-term, but I >>> wonder if making more of the low-level stuff C++ would make it easier >>> for performance nuts to write their own code in C/C++ interfacing with >>> numpy, and then expose it to python. After playing around with ufuncs >>> at the C level for a little while last summer, I quickly realized any >>> simplifications would be greatly appreciated. >>> >> >> This is all possible, yes. The way this typically works is that library >> authors use advanced C++ techniques to get generality, performance, and >> usability. The library user can then write code which is very simple and >> written in a way which makes simple errors very difficult to make compared >> to using a C-like API. >> > > While the longer compile times are going to annoy me, I don't have a > strong opinion on using C++. One thing to keep in mind though is > portability. Numpy is used on many platforms and with many compilers. > Keeping things working on AIX or with a PathScale compiler for example will > be a lot more difficult when using C++. Or will support for not-so-common > platforms be reduced? > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Ralf makes a good point. During the early numpy development days I was eternally fighting with Solaris compilers. It's not really a big issue for us anymore since we have dropped Solaris support. But I'm '+1' for having easy numpy distribution being something to consider. Chris
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion