Re: [Numpy-discussion] How a transition to C++ could work
Hey, Mark On Feb 18, 2012 11:18 PM, Mark Wiebe mwwi...@gmail.com wrote: My experience has been that providing a C API from a C++ library is no harder than providing a C API from a C library. Interfacing to compiled C++ libs have been tricky, so can this concern be dismissed so easily? (Some examples that came to mind were _import_array--easy to fix because it is ours, I guess--or Cython generated code). A really important point to emphasize is that C++ allows for a strategy where we gradually evolve the codebase to better incorporate its language features. This is what I'm advocating. No massive rewrite, no disruptive changes. Gradual code evolution, with ABI and API compatibility comparable to what we've delivered in 1.6 and the upcoming 1.7 releases. If we're to switch to C++ (a language that can very easily be wielded in terrible ways), then this certainly seems like a sound approach. Regards Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
Hi, Thanks for this - it's very helpful. On Sat, Feb 18, 2012 at 11:18 PM, Mark Wiebe mwwi...@gmail.com wrote: The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: The C subset of C++ is just as efficient as C. C++ supports cleaner code in several significant cases. C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. C++ never requires uglier code. Some people have pointed out that the Python templating preprocessor used in NumPy is suggestive of C++ templates. A nice advantage of using C++ templates instead of this preprocessor is that third party tools to improve software quality, like static analysis tools, will be able to run directly on the NumPy source code. Additionally, IDEs like XCode and Visual C++ will be able to provide the full suite of tab-completion/intellisense features that programmers working in those environments are accustomed to. There are concerns about ABI/API interoperability and interactions with C++ exceptions. I've dealt with these types of issues on enough platforms to know that while they're important, they're a lot easier to handle than the issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that providing a C API from a C++ library is no harder than providing a C API from a C library. It's worth comparing the possibility of C++ versus the possibility of other languages, and the ones that have been suggested for consideration are D, Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language has to interact naturally with the CPython API. It needs to provide direct access to all the various sizes of signed int, unsigned int, and float. It needs to have mature compiler support wherever we want to deploy NumPy. Taken together, these requirements eliminate a majority of these possibilities. From these criteria, the only languages which seem to have a clear possibility for the implementation of Numpy are C, C++, and D. On which criteria did you eliminate Cython? The biggest question for any of these possibilities is how do you get the code from its current state to a state which fully utilizes the target language. C++, being nearly a superset of C, offers a strategy to gradually absorb C++ features. Any of the other language choices requires a rewrite, which would be quite disruptive. Because of all these reasons taken together, I believe the only realistic language to use, other than sticking with C, is C++. Finally, here's what I think is the best strategy for transitioning to C++. First, let's consider what we do if 1.7 becomes an LTS release. 1) Immediately after branching for 1.7, we minimally patch all the .c files so that they can build with a C++ compiler and with a C compiler at the same time. Then we rename all .c - .cpp, and update the build systems for C++. 2) During the 1.8 development cycle, we heavily restrict C++ feature usage. But, where a feature implementation would be arguably easier and less error-prone with C++, we allow it. This is a period for learning about C++ and how it can benefit NumPy. 3) After the 1.8 release, the community will have developed more experience with C++, and will be in a better position to discuss a way forward. If, for some reason, a 1.7 LTS is unacceptable, it might be a good idea to restrict the 1.8 release to the subset of both C and C++. I would much prefer using the 1.8 development cycle to dip our toes into the C++ world to get some of the low-hanging benefits without doing anything disruptive. A really important point to emphasize is that C++ allows for a strategy where we gradually evolve the codebase to better incorporate its language features. This is what I'm advocating. No massive rewrite, no disruptive changes. Gradual code evolution, with ABI and API compatibility comparable to what we've delivered in 1.6 and the upcoming 1.7 releases. Do you have any comment on the need for coding standards when using C++? I saw the warning in: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale about using C++ unwisely. See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 2:24 AM, Stéfan van der Walt ste...@sun.ac.zawrote: Hey, Mark On Feb 18, 2012 11:18 PM, Mark Wiebe mwwi...@gmail.com wrote: My experience has been that providing a C API from a C++ library is no harder than providing a C API from a C library. Interfacing to compiled C++ libs have been tricky, so can this concern be dismissed so easily? (Some examples that came to mind were _import_array--easy to fix because it is ours, I guess--or Cython generated code). I'm speaking from personal experience having dealt with these types of issues extensively before. If people have more detailed examples of problems, possibly links to discussions where one of these problems has occurred, that would be helpful. This argument isn't very useful if it's just my positive experience versus others negative experience, we need to get into specifics to advance the discussion. -Mark A really important point to emphasize is that C++ allows for a strategy where we gradually evolve the codebase to better incorporate its language features. This is what I'm advocating. No massive rewrite, no disruptive changes. Gradual code evolution, with ABI and API compatibility comparable to what we've delivered in 1.6 and the upcoming 1.7 releases. If we're to switch to C++ (a language that can very easily be wielded in terrible ways), then this certainly seems like a sound approach. Regards Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 2:32 AM, Matthew Brett matthew.br...@gmail.comwrote: Hi, Thanks for this - it's very helpful. On Sat, Feb 18, 2012 at 11:18 PM, Mark Wiebe mwwi...@gmail.com wrote: The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: The C subset of C++ is just as efficient as C. C++ supports cleaner code in several significant cases. C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. C++ never requires uglier code. Some people have pointed out that the Python templating preprocessor used in NumPy is suggestive of C++ templates. A nice advantage of using C++ templates instead of this preprocessor is that third party tools to improve software quality, like static analysis tools, will be able to run directly on the NumPy source code. Additionally, IDEs like XCode and Visual C++ will be able to provide the full suite of tab-completion/intellisense features that programmers working in those environments are accustomed to. There are concerns about ABI/API interoperability and interactions with C++ exceptions. I've dealt with these types of issues on enough platforms to know that while they're important, they're a lot easier to handle than the issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that providing a C API from a C++ library is no harder than providing a C API from a C library. It's worth comparing the possibility of C++ versus the possibility of other languages, and the ones that have been suggested for consideration are D, Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language has to interact naturally with the CPython API. It needs to provide direct access to all the various sizes of signed int, unsigned int, and float. It needs to have mature compiler support wherever we want to deploy NumPy. Taken together, these requirements eliminate a majority of these possibilities. From these criteria, the only languages which seem to have a clear possibility for the implementation of Numpy are C, C++, and D. On which criteria did you eliminate Cython? The mature compiler support one. As glue between C/C++ and Python, it looks great, but Dag's evaluation of Cython's maturity for implementing the style of functionality in NumPy seems pretty authoritative. So people don't have to dig through the giant email thread, here's the specific message content from Dag, and it's context: On 02/18/2012 12:35 PM, Charles R Harris wrote: No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way. +1. Even I who have contributed to Cython realize this; last autumn I implemented a library by writing it in C and wrapping it in Cython. The biggest question for any of these possibilities is how do you get the code from its current state to a state which fully utilizes the target language. C++, being nearly a superset of C, offers a strategy to gradually absorb C++ features. Any of the other language choices requires a rewrite, which would be quite disruptive. Because of all these reasons taken together, I believe the only realistic language to use, other than sticking with C, is C++. Finally, here's what I think is the best strategy for transitioning to C++. First, let's consider what we do if 1.7 becomes an LTS release. 1) Immediately after branching for 1.7, we minimally patch all the .c files so that they can build with a C++ compiler and with a C compiler at the same time. Then we rename all .c - .cpp, and update the build systems for C++. 2) During the 1.8 development cycle, we heavily restrict C++ feature usage. But, where a feature implementation would be arguably easier and less error-prone with C++, we allow it. This is a period for learning about C++ and how it can benefit NumPy. 3) After the 1.8 release, the community will have developed more experience with C++, and will be in a better position to discuss a way forward. If, for some reason, a 1.7 LTS is unacceptable, it might be a good idea to restrict the 1.8 release to the subset of both C and C++. I would much prefer using the 1.8 development cycle to dip our toes into the C++ world to get some of the low-hanging benefits without doing anything disruptive. A really important
Re: [Numpy-discussion] How a transition to C++ could work
Hi Mark, thank you for joining this discussion. On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe mwwi...@gmail.com wrote: The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: The C subset of C++ is just as efficient as C. C++ supports cleaner code in several significant cases. C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. C++ never requires uglier code. I think those arguments will not be very useful: they are subjective, and unlikely to convince people who prefer C to C++. There are concerns about ABI/API interoperability and interactions with C++ exceptions. I've dealt with these types of issues on enough platforms to know that while they're important, they're a lot easier to handle than the issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that providing a C API from a C++ library is no harder than providing a C API from a C library. This needs more details. I have some experience in both areas as well, and mine is quite different. Reiterating a few examples that worry me: - how can you ensure that exceptions happening in C++ will never cross different .so/.dll ? How can one make sure C++ extensions built by different compilers can work ? Is not using exceptions like it is done in zeromq acceptable ? (would be nice to find out more about the decisions made by the zeromq team about their usage of C++). I cannot find a recent example, but I have seen errors similar to this(http://software.intel.com/en-us/forums/showthread.php?t=42940) quite a few times. - how can you expose in C some heavily-using C++ features ? I would expect you would like to use templates for iterators in numpy - you can you make them available to 3rd party extensions without requiring C++. It's worth comparing the possibility of C++ versus the possibility of other languages, and the ones that have been suggested for consideration are D, Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language has to interact naturally with the CPython API. It needs to provide direct access to all the various sizes of signed int, unsigned int, and float. It needs to have mature compiler support wherever we want to deploy NumPy. Taken together, these requirements eliminate a majority of these possibilities. From these criteria, the only languages which seem to have a clear possibility for the implementation of Numpy are C, C++, and D. For D, I suspect the tooling is not mature enough, but I'm not 100% certain of that. While I agree that no other language is realistic, staying in C has the nice advantage that we can more easily use one of them if they mature (rust/D - go, rpython, C#/java can be dismissed for fundamental technical reasons right away). This is not a very strong argument against using C++, obviously. 1) Immediately after branching for 1.7, we minimally patch all the .c files so that they can build with a C++ compiler and with a C compiler at the same time. Then we rename all .c - .cpp, and update the build systems for C++. 2) During the 1.8 development cycle, we heavily restrict C++ feature usage. But, where a feature implementation would be arguably easier and less error-prone with C++, we allow it. This is a period for learning about C++ and how it can benefit NumPy. 3) After the 1.8 release, the community will have developed more experience with C++, and will be in a better position to discuss a way forward. A step that would be useful sooner rather than later is one where numpy has been split into smaller extensions (instead of multiarray/ufunc, essentially). This would help avoiding recompilation of lots of code for any small change. It is already quite painful with C, but with C++, it will be unbearable. This can be done in C, and would be useful whether the decision to move to C++ is accepted or not. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 2:51 AM, Stéfan van der Walt ste...@sun.ac.zawrote: On Feb 19, 2012 12:34 AM, Mark Wiebe mwwi...@gmail.com wrote: I'm speaking from personal experience having dealt with these types of issues extensively before. If people have more detailed examples of problems, possibly links to discussions where one of these problems has occurred, that would be helpful. This argument isn't very useful if it's just my positive experience versus others negative experience, we need to get into specifics to advance the discussion. Wow, the NumPy list has gotten so serious :) I'm certainly not doubting anyone's experience--just trying to get a handle on possible transition risks. Heh, when threads get longer than 50 message, I think that's a sign something is serious! OK, so let's talk specifics: how do you dynamically grab a function pointer to a compiled C++ library, a la ctypes? Feel free to point me to StackOverflow or elsewhere. If the C++ library is exposing a C-API, it's identical to the case for C. If it's not, and you must access the functions via ctypes anyway, you need to determine the mangled name of the function. The mangled name encodes the types of the parameters, to support function polymorphism, and is different for each OS platform. Also, if the function takes a class object as a parameter, or returns one, ctypes doesn't give you a way to forward that parameter. In general, the standard advice is to wrap the C++ library using Boost.Python, Cython, or something similar. Dealing directly with the mangled names, while possible, is not likely to make you happy. Cheers, Mark Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
Hi, On Sun, Feb 19, 2012 at 12:49 AM, Mark Wiebe mwwi...@gmail.com wrote: On Sun, Feb 19, 2012 at 2:32 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, Thanks for this - it's very helpful. On Sat, Feb 18, 2012 at 11:18 PM, Mark Wiebe mwwi...@gmail.com wrote: The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: The C subset of C++ is just as efficient as C. C++ supports cleaner code in several significant cases. C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. C++ never requires uglier code. Some people have pointed out that the Python templating preprocessor used in NumPy is suggestive of C++ templates. A nice advantage of using C++ templates instead of this preprocessor is that third party tools to improve software quality, like static analysis tools, will be able to run directly on the NumPy source code. Additionally, IDEs like XCode and Visual C++ will be able to provide the full suite of tab-completion/intellisense features that programmers working in those environments are accustomed to. There are concerns about ABI/API interoperability and interactions with C++ exceptions. I've dealt with these types of issues on enough platforms to know that while they're important, they're a lot easier to handle than the issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that providing a C API from a C++ library is no harder than providing a C API from a C library. It's worth comparing the possibility of C++ versus the possibility of other languages, and the ones that have been suggested for consideration are D, Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language has to interact naturally with the CPython API. It needs to provide direct access to all the various sizes of signed int, unsigned int, and float. It needs to have mature compiler support wherever we want to deploy NumPy. Taken together, these requirements eliminate a majority of these possibilities. From these criteria, the only languages which seem to have a clear possibility for the implementation of Numpy are C, C++, and D. On which criteria did you eliminate Cython? The mature compiler support one. I took you to mean that the code would compile on any platform. As Cython compiles to C, I think Cython passes, if that is what you meant. Maybe you meant you thought that Cython was not mature in some sense, but if so, I'm not sure which sense you mean. As glue between C/C++ and Python, it looks great, but Dag's evaluation of Cython's maturity for implementing the style of functionality in NumPy seems pretty authoritative. So people don't have to dig through the giant email thread, here's the specific message content from Dag, and it's context: On 02/18/2012 12:35 PM, Charles R Harris wrote: No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way. +1. Even I who have contributed to Cython realize this; last autumn I implemented a library by writing it in C and wrapping it in Cython. As you probably saw, I think the proposal was indeed to use Cython to provide the higher-level parts of the core, while refactoring the rest of the C code underneath it. Obviously one could also refactor the C into C++, so the proposal to use Cython for some of the core is to some extent orthogonal to the choice of C / C++.I don't know the core, perhaps there isn't much of it that would benefit from being in Cython, I'd be interested to know your views. But, superficially, it seems like an attractive solution to making (some of) the core easier to maintain. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
Date: Sun, 19 Feb 2012 01:18:20 -0600 From: Mark Wiebe mwwi...@gmail.com Subject: [Numpy-discussion] How a transition to C++ could work To: Discussion of Numerical Python NumPy-Discussion@scipy.org Message-ID: CAMRnEmpVTmt=kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com Content-Type: text/plain; charset=utf-8 The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: - The C subset of C++ is just as efficient as C. - C++ supports cleaner code in several significant cases. - C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. - C++ never requires uglier code. I think they're trying to solve a different problem. I thought the problem that numpy was trying to solve is make inner loops of numerical algorithms very fast. C is great for this because you can write C code and picture precisely what assembly code will be generated. C++ removes some of this advantage -- now there is extra code generated by the compiler to handle constructors, destructors, operators etc which can make a material difference to fast inner loops. So you end up just writing C-style anyway. On the other hand, if your problem really is write lots of OO code with virtual methods and have it turned into machine code (probably like the GCC guys) then maybe C++ is the way to go. Some more opinions on C++: http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/ Sorry if this all seems a bit negative about C++. It's just been my experience that C++ adds complexity while C keeps things nice and simple. Looking forward to seeing some more concrete examples. Cheers Ben ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 2:56 AM, David Cournapeau courn...@gmail.comwrote: Hi Mark, thank you for joining this discussion. On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe mwwi...@gmail.com wrote: The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: The C subset of C++ is just as efficient as C. C++ supports cleaner code in several significant cases. C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. C++ never requires uglier code. I think those arguments will not be very useful: they are subjective, and unlikely to convince people who prefer C to C++. They are arguments from a team which implement both a C and a C++ compiler. In the spectrum of possible authorities on the matter, they rate about as high as I can imagine. There are concerns about ABI/API interoperability and interactions with C++ exceptions. I've dealt with these types of issues on enough platforms to know that while they're important, they're a lot easier to handle than the issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that providing a C API from a C++ library is no harder than providing a C API from a C library. This needs more details. I have some experience in both areas as well, and mine is quite different. Reiterating a few examples that worry me: - how can you ensure that exceptions happening in C++ will never cross different .so/.dll ? This is a necessary part of providing a C API, and is included as a requirement of doing that. All C++ libraries which expose a C API deal with this. How can one make sure C++ extensions built by different compilers can work ? This is no different from the situation in C. Already in C on Windows, one can't build NumPy with a different version of Visual C++ than the one used to build CPython. Is not using exceptions like it is done in zeromq acceptable ? (would be nice to find out more about the decisions made by the zeromq team about their usage of C++). I prefer to use exceptions in C++, but some major projects have decided to disable them. LLVM/Clang is the most notable example. My experience working with high-performance graphics code has been that appropriate use of exceptions (i.e. not doing something like using them for control flow) do not pose a problem. I cannot find a recent example, but I have seen errors similar to this(http://software.intel.com/en-us/forums/showthread.php?t=42940) quite a few times. This kind of thing would happen when using 'new' to allocate memory, and with the compiler setting enabled to raise bad_alloc on such allocation failures (the default for most compilers nowadays). If exception handling is disabled in the compiler, new will return NULL instead. Unless the compiler has a bizarre issue, catching either std::exception or std::bad_alloc specifically within NumPy should be sufficient to deal with it. Also note that the possibility of something like this will only arise once more advanced C++ features are being adopted. - how can you expose in C some heavily-using C++ features ? If the advantages of those C++ features depend on the C++ language, you have to map them to a limited subset of the feature in C. For example, if a feature is based on a C++ template, you can instantiate specific instances of the template for all the types you want to support from C. I would expect you would like to use templates for iterators in numpy - you can you make them available to 3rd party extensions without requiring C++. Yes, something like the nditer is a good example. From C, it would have to retain an API in the current style, but C++ users could gain an easier-to-use variant. It's worth comparing the possibility of C++ versus the possibility of other languages, and the ones that have been suggested for consideration are D, Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language has to interact naturally with the CPython API. It needs to provide direct access to all the various sizes of signed int, unsigned int, and float. It needs to have mature compiler support wherever we want to deploy NumPy. Taken together, these requirements eliminate a majority of these possibilities. From these criteria, the only languages which seem to have a clear possibility for the implementation of Numpy are C, C++, and D. For D, I suspect the tooling is not mature enough, but I'm not 100% certain of that. While I agree that no other language is realistic, staying in C has the nice advantage that we can more easily use one of them if
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 3:10 AM, Matthew Brett matthew.br...@gmail.comwrote: snip As you probably saw, I think the proposal was indeed to use Cython to provide the higher-level parts of the core, while refactoring the rest of the C code underneath it. Obviously one could also refactor the C into C++, so the proposal to use Cython for some of the core is to some extent orthogonal to the choice of C / C++.I don't know the core, perhaps there isn't much of it that would benefit from being in Cython, I'd be interested to know your views. But, superficially, it seems like an attractive solution to making (some of) the core easier to maintain. Using Cython in the binding role is orthogonal to the choice of C versus C++, you are right. This binding aspect isn't the part where C++ provides most of the benefits I envision, so increasing (or decreasing) the use of Cython within NumPy seems like a good topic for a separate thread just about Cython. Cheers, Mark Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 9:19 AM, Mark Wiebe mwwi...@gmail.com wrote: On Sun, Feb 19, 2012 at 2:56 AM, David Cournapeau courn...@gmail.com wrote: Hi Mark, thank you for joining this discussion. On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe mwwi...@gmail.com wrote: The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: The C subset of C++ is just as efficient as C. C++ supports cleaner code in several significant cases. C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. C++ never requires uglier code. I think those arguments will not be very useful: they are subjective, and unlikely to convince people who prefer C to C++. They are arguments from a team which implement both a C and a C++ compiler. In the spectrum of possible authorities on the matter, they rate about as high as I can imagine. There are quite a few arguments who are as authoritative and think those arguments are not very strong. They are as unlikely to change your mind as the gcc's arguments are unlikely to convince me I am afraid. This is a necessary part of providing a C API, and is included as a requirement of doing that. All C++ libraries which expose a C API deal with this. The only two given examples given so far for a C library around C++ code (clang and zeromq) do not use exceptions. Can you provide an example of a C++ library that has a C API and does use exception ? If not, I would like to know the technical details if you don't mind expanding on them. How can one make sure C++ extensions built by different compilers can work ? This is no different from the situation in C. Already in C on Windows, one can't build NumPy with a different version of Visual C++ than the one used to build CPython. This is a different situation. On windows, the mismatch between VS is due to the way win32 has been used by python itself - it could actually be fixed eventually by python (there are efforts in that regard). It is not a language issue. Except for that case, numpy has a pretty good record of allowing people to mix and match compilers. Using mingw on windows and intel compilers on linux are the typical cases, but not the only ones. I would expect you would like to use templates for iterators in numpy - you can you make them available to 3rd party extensions without requiring C++. Yes, something like the nditer is a good example. From C, it would have to retain an API in the current style, but C++ users could gain an easier-to-use variant. Providing an official C++ library on top of the current C API would certainly be nice for people who prefer C++ to C. But this is quite different from using C++ at the core. The current way iterators work would be very hard (if at all possible ?) to rewrite in idiomatic in C++ while keeping even API compatibility with the existing C one. For numpy 2.0, we can somehow relax on this. If it is not too time consuming, could you show a simplified example of how it would work to write the iterator in C++ while providing a C API in the spirit of what we have now ? David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh ben_w_...@yahoo.co.uk wrote: Date: Sun, 19 Feb 2012 01:18:20 -0600 From: Mark Wiebe mwwi...@gmail.com Subject: [Numpy-discussion] How a transition to C++ could work To: Discussion of Numerical Python NumPy-Discussion@scipy.org Message-ID: CAMRnEmpVTmt= kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com Content-Type: text/plain; charset=utf-8 The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: - The C subset of C++ is just as efficient as C. - C++ supports cleaner code in several significant cases. - C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. - C++ never requires uglier code. I think they're trying to solve a different problem. I thought the problem that numpy was trying to solve is make inner loops of numerical algorithms very fast. C is great for this because you can write C code and picture precisely what assembly code will be generated. What you're describing is also the C subset of C++, so your experience applies just as well to C++! C++ removes some of this advantage -- now there is extra code generated by the compiler to handle constructors, destructors, operators etc which can make a material difference to fast inner loops. So you end up just writing C-style anyway. This is in fact not true, and writing in C++ style can often produce faster code. A classic example of this is C qsort vs C++ std::sort. You may be thinking of using virtual functions in a class hierarchy, where a tradeoff between performance and run-time polymorphism is being done. Emulating the functionality that virtual functions provide in C will give similar performance characteristics as the C++ language feature itself. On the other hand, if your problem really is write lots of OO code with virtual methods and have it turned into machine code (probably like the GCC guys) then maybe C++ is the way to go. Managing the complexity of the dtype subsystem, the ufunc subsystem, the nditer component, and other parts of NumPy could benefit from C++ Not in a stereotypical OO code with virtual methods way, that is not how typical modern C++ is done. Some more opinions on C++: http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/ Sorry if this all seems a bit negative about C++. It's just been my experience that C++ adds complexity while C keeps things nice and simple. Yes, there are lots of negative opinions about C++ out there, it's true. Just like there are negative opinions about C, Java, C#, and any other language which has become popular. My experience with regard to complexity and C vs C++ is that C forces the complexity of dealing with resource lifetimes out into all the code everyone writes, while C++ allows one to encapsulate that sort of complexity into a class which is small and more easily verifiable. This is about code quality, and the best quality C++ code I've worked with has been way easier to program in than the best quality C code I've worked with. Looking forward to seeing some more concrete examples. In the interests of starting small, here's one that I mentioned in the other thread: Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. Cheers, Mark Cheers Ben ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe mwwi...@gmail.com wrote: On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh ben_w_...@yahoo.co.uk wrote: Date: Sun, 19 Feb 2012 01:18:20 -0600 From: Mark Wiebe mwwi...@gmail.com Subject: [Numpy-discussion] How a transition to C++ could work To: Discussion of Numerical Python NumPy-Discussion@scipy.org Message-ID: CAMRnEmpVTmt=kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com Content-Type: text/plain; charset=utf-8 The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: - The C subset of C++ is just as efficient as C. - C++ supports cleaner code in several significant cases. - C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. - C++ never requires uglier code. I think they're trying to solve a different problem. I thought the problem that numpy was trying to solve is make inner loops of numerical algorithms very fast. C is great for this because you can write C code and picture precisely what assembly code will be generated. What you're describing is also the C subset of C++, so your experience applies just as well to C++! C++ removes some of this advantage -- now there is extra code generated by the compiler to handle constructors, destructors, operators etc which can make a material difference to fast inner loops. So you end up just writing C-style anyway. This is in fact not true, and writing in C++ style can often produce faster code. A classic example of this is C qsort vs C++ std::sort. You may be thinking of using virtual functions in a class hierarchy, where a tradeoff between performance and run-time polymorphism is being done. Emulating the functionality that virtual functions provide in C will give similar performance characteristics as the C++ language feature itself. On the other hand, if your problem really is write lots of OO code with virtual methods and have it turned into machine code (probably like the GCC guys) then maybe C++ is the way to go. Managing the complexity of the dtype subsystem, the ufunc subsystem, the nditer component, and other parts of NumPy could benefit from C++ Not in a stereotypical OO code with virtual methods way, that is not how typical modern C++ is done. Some more opinions on C++: http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/ Sorry if this all seems a bit negative about C++. It's just been my experience that C++ adds complexity while C keeps things nice and simple. Yes, there are lots of negative opinions about C++ out there, it's true. Just like there are negative opinions about C, Java, C#, and any other language which has become popular. My experience with regard to complexity and C vs C++ is that C forces the complexity of dealing with resource lifetimes out into all the code everyone writes, while C++ allows one to encapsulate that sort of complexity into a class which is small and more easily verifiable. This is about code quality, and the best quality C++ code I've worked with has been way easier to program in than the best quality C code I've worked with. While I actually believe this to be true (very good C++ can be easier to read/use than very good C). Good C is also much more common than good C++, at least in open source. On the good C++ codebases you have been working on, could you rely on everybody being a very good C++ programmer ? Because this will most likely never happen for numpy. This is the crux of the argument from an organizational POV: the variance in C++ code quality is much more difficult to control. I have seen C++ code that is certainly much poorer and more complex than numpy, to a point where not much could be done to save the codebase. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 3:45 AM, David Cournapeau courn...@gmail.comwrote: On Sun, Feb 19, 2012 at 9:19 AM, Mark Wiebe mwwi...@gmail.com wrote: On Sun, Feb 19, 2012 at 2:56 AM, David Cournapeau courn...@gmail.com wrote: Hi Mark, thank you for joining this discussion. On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe mwwi...@gmail.com wrote: The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: The C subset of C++ is just as efficient as C. C++ supports cleaner code in several significant cases. C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. C++ never requires uglier code. I think those arguments will not be very useful: they are subjective, and unlikely to convince people who prefer C to C++. They are arguments from a team which implement both a C and a C++ compiler. In the spectrum of possible authorities on the matter, they rate about as high as I can imagine. There are quite a few arguments who are as authoritative and think those arguments are not very strong. They are as unlikely to change your mind as the gcc's arguments are unlikely to convince me I am afraid. I imagine only points 2 and 3 are controversial for you, 1 and 4 are pretty straightforward, yes? We could dig into the specifics of these points if you'd like. This is a necessary part of providing a C API, and is included as a requirement of doing that. All C++ libraries which expose a C API deal with this. The only two given examples given so far for a C library around C++ code (clang and zeromq) do not use exceptions. Can you provide an example of a C++ library that has a C API and does use exception ? I couldn't find a nice example with a short search, unfortunately. If not, I would like to know the technical details if you don't mind expanding on them. Sure. First, one would standardize on having all exceptions be derived from std::exception. (std::bad_alloc, which we discussed before, and all other standard exceptions, do). Then, each function exposed to the C API where internally C++ exceptions are used would look roughly like: int api_function(int param1, float param2, PyArrayObject *param3) { try { ... implementation .. return 0;. } catch(std::bad_alloc) { PyErr_NoMemory(); return -1; } catch(numpy::convergence_error e) { PyErr_SetString(NpyExc_ConvergenceError, e.what()); return -1; } catch(std::exception e) { PyErr_SetString(PyExc_RuntimeError, e.what()); return -1; } } How can one make sure C++ extensions built by different compilers can work ? This is no different from the situation in C. Already in C on Windows, one can't build NumPy with a different version of Visual C++ than the one used to build CPython. This is a different situation. On windows, the mismatch between VS is due to the way win32 has been used by python itself - it could actually be fixed eventually by python (there are efforts in that regard). It is not a language issue. I've already tried fixing this and building NumPy with Visual C++ 2010, and the memory allocation/deallocation issues were pretty easy to fix. The problem was that NumPy C code takes a FILE* object from a file opened from within Python code. The root of the issue is when CPython uses a different C runtime library (MSVCR##.dll) than NumPy. Except for that case, numpy has a pretty good record of allowing people to mix and match compilers. Using mingw on windows and intel compilers on linux are the typical cases, but not the only ones. In these cases the compiler is adopting the name-mangling ABI of the compiler it's matching. On Windows, the intel compiler uses the Visual C++ ABI, and on Linux, it uses the gcc ABI. But, since the CPython API is a C API, things would still work fine even if the name-mangling were different. I would expect you would like to use templates for iterators in numpy - you can you make them available to 3rd party extensions without requiring C++. Yes, something like the nditer is a good example. From C, it would have to retain an API in the current style, but C++ users could gain an easier-to-use variant. Providing an official C++ library on top of the current C API would certainly be nice for people who prefer C++ to C. But this is quite different from using C++ at the core. That's true. The current way iterators work would be very hard (if at all possible ?) to rewrite in idiomatic in C++ while keeping even API compatibility with
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 2:14 AM, David Cournapeau courn...@gmail.com wrote: On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe mwwi...@gmail.com wrote: On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh ben_w_...@yahoo.co.uk wrote: Date: Sun, 19 Feb 2012 01:18:20 -0600 From: Mark Wiebe mwwi...@gmail.com Subject: [Numpy-discussion] How a transition to C++ could work To: Discussion of Numerical Python NumPy-Discussion@scipy.org Message-ID: CAMRnEmpVTmt=kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com Content-Type: text/plain; charset=utf-8 The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: - The C subset of C++ is just as efficient as C. - C++ supports cleaner code in several significant cases. - C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. - C++ never requires uglier code. I think they're trying to solve a different problem. I thought the problem that numpy was trying to solve is make inner loops of numerical algorithms very fast. C is great for this because you can write C code and picture precisely what assembly code will be generated. What you're describing is also the C subset of C++, so your experience applies just as well to C++! C++ removes some of this advantage -- now there is extra code generated by the compiler to handle constructors, destructors, operators etc which can make a material difference to fast inner loops. So you end up just writing C-style anyway. This is in fact not true, and writing in C++ style can often produce faster code. A classic example of this is C qsort vs C++ std::sort. You may be thinking of using virtual functions in a class hierarchy, where a tradeoff between performance and run-time polymorphism is being done. Emulating the functionality that virtual functions provide in C will give similar performance characteristics as the C++ language feature itself. On the other hand, if your problem really is write lots of OO code with virtual methods and have it turned into machine code (probably like the GCC guys) then maybe C++ is the way to go. Managing the complexity of the dtype subsystem, the ufunc subsystem, the nditer component, and other parts of NumPy could benefit from C++ Not in a stereotypical OO code with virtual methods way, that is not how typical modern C++ is done. Some more opinions on C++: http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/ Sorry if this all seems a bit negative about C++. It's just been my experience that C++ adds complexity while C keeps things nice and simple. Yes, there are lots of negative opinions about C++ out there, it's true. Just like there are negative opinions about C, Java, C#, and any other language which has become popular. My experience with regard to complexity and C vs C++ is that C forces the complexity of dealing with resource lifetimes out into all the code everyone writes, while C++ allows one to encapsulate that sort of complexity into a class which is small and more easily verifiable. This is about code quality, and the best quality C++ code I've worked with has been way easier to program in than the best quality C code I've worked with. While I actually believe this to be true (very good C++ can be easier to read/use than very good C). Good C is also much more common than good C++, at least in open source. On the good C++ codebases you have been working on, could you rely on everybody being a very good C++ programmer ? Because this will most likely never happen for numpy. This is the crux of the argument from an organizational POV: the variance in C++ code quality is much more difficult to control. I have seen C++ code that is certainly much poorer and more complex than numpy, to a point where not much could be done to save the codebase. Can this possibly be extended to the following: How will Mark's (extensive) experience about performance and long-term consequences of design decisions be communicated to future developers? We not only want new numpy developers, we want them to write good code without unintentional performance regressions. It seems like something more than just code guidelines would be required. There's also the issue that c++ compilation error messages can be awful and disheartening. Are there ways of making them not as bad by following certain coding styles, or is that baked in? (I know clang is moving towards making them much better, though.) -Chris cheers, David ___ NumPy-Discussion mailing list NumPy
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 4:14 AM, David Cournapeau courn...@gmail.comwrote: On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe mwwi...@gmail.com wrote: On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh ben_w_...@yahoo.co.uk wrote: Date: Sun, 19 Feb 2012 01:18:20 -0600 From: Mark Wiebe mwwi...@gmail.com Subject: [Numpy-discussion] How a transition to C++ could work To: Discussion of Numerical Python NumPy-Discussion@scipy.org Message-ID: CAMRnEmpVTmt=kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com Content-Type: text/plain; charset=utf-8 The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: - The C subset of C++ is just as efficient as C. - C++ supports cleaner code in several significant cases. - C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. - C++ never requires uglier code. I think they're trying to solve a different problem. I thought the problem that numpy was trying to solve is make inner loops of numerical algorithms very fast. C is great for this because you can write C code and picture precisely what assembly code will be generated. What you're describing is also the C subset of C++, so your experience applies just as well to C++! C++ removes some of this advantage -- now there is extra code generated by the compiler to handle constructors, destructors, operators etc which can make a material difference to fast inner loops. So you end up just writing C-style anyway. This is in fact not true, and writing in C++ style can often produce faster code. A classic example of this is C qsort vs C++ std::sort. You may be thinking of using virtual functions in a class hierarchy, where a tradeoff between performance and run-time polymorphism is being done. Emulating the functionality that virtual functions provide in C will give similar performance characteristics as the C++ language feature itself. On the other hand, if your problem really is write lots of OO code with virtual methods and have it turned into machine code (probably like the GCC guys) then maybe C++ is the way to go. Managing the complexity of the dtype subsystem, the ufunc subsystem, the nditer component, and other parts of NumPy could benefit from C++ Not in a stereotypical OO code with virtual methods way, that is not how typical modern C++ is done. Some more opinions on C++: http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/ Sorry if this all seems a bit negative about C++. It's just been my experience that C++ adds complexity while C keeps things nice and simple. Yes, there are lots of negative opinions about C++ out there, it's true. Just like there are negative opinions about C, Java, C#, and any other language which has become popular. My experience with regard to complexity and C vs C++ is that C forces the complexity of dealing with resource lifetimes out into all the code everyone writes, while C++ allows one to encapsulate that sort of complexity into a class which is small and more easily verifiable. This is about code quality, and the best quality C++ code I've worked with has been way easier to program in than the best quality C code I've worked with. While I actually believe this to be true (very good C++ can be easier to read/use than very good C). Good C is also much more common than good C++, at least in open source. On the good C++ codebases you have been working on, could you rely on everybody being a very good C++ programmer? Not initially, but I designed the coding standards and taught the programmers I hired how to write good C++ code. Because this will most likely never happen for numpy. This is the role I see good coding standards and consistent code review playing. Programmers who don't know how to write good C++ code can be taught. There are also good books to read, like C++ Coding Standards, Effective C++, and others that can help people learn proper technique. This is the crux of the argument from an organizational POV: the variance in C++ code quality is much more difficult to control. I have seen C++ code that is certainly much poorer and more complex than numpy, to a point where not much could be done to save the codebase. That's a consequence of the power C++ provides. It assumes the programmer knows what he or she is doing, and provides the tools to make things great or shoot oneself in the foot. I'd like to use that power to make NumPy better, in a way which uses high
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 4:30 AM, Christopher Jordan-Squire cjord...@uw.eduwrote: On Sun, Feb 19, 2012 at 2:14 AM, David Cournapeau courn...@gmail.com wrote: On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe mwwi...@gmail.com wrote: On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh ben_w_...@yahoo.co.uk wrote: Date: Sun, 19 Feb 2012 01:18:20 -0600 From: Mark Wiebe mwwi...@gmail.com Subject: [Numpy-discussion] How a transition to C++ could work To: Discussion of Numerical Python NumPy-Discussion@scipy.org Message-ID: CAMRnEmpVTmt=kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com Content-Type: text/plain; charset=utf-8 The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: - The C subset of C++ is just as efficient as C. - C++ supports cleaner code in several significant cases. - C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. - C++ never requires uglier code. I think they're trying to solve a different problem. I thought the problem that numpy was trying to solve is make inner loops of numerical algorithms very fast. C is great for this because you can write C code and picture precisely what assembly code will be generated. What you're describing is also the C subset of C++, so your experience applies just as well to C++! C++ removes some of this advantage -- now there is extra code generated by the compiler to handle constructors, destructors, operators etc which can make a material difference to fast inner loops. So you end up just writing C-style anyway. This is in fact not true, and writing in C++ style can often produce faster code. A classic example of this is C qsort vs C++ std::sort. You may be thinking of using virtual functions in a class hierarchy, where a tradeoff between performance and run-time polymorphism is being done. Emulating the functionality that virtual functions provide in C will give similar performance characteristics as the C++ language feature itself. On the other hand, if your problem really is write lots of OO code with virtual methods and have it turned into machine code (probably like the GCC guys) then maybe C++ is the way to go. Managing the complexity of the dtype subsystem, the ufunc subsystem, the nditer component, and other parts of NumPy could benefit from C++ Not in a stereotypical OO code with virtual methods way, that is not how typical modern C++ is done. Some more opinions on C++: http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/ Sorry if this all seems a bit negative about C++. It's just been my experience that C++ adds complexity while C keeps things nice and simple. Yes, there are lots of negative opinions about C++ out there, it's true. Just like there are negative opinions about C, Java, C#, and any other language which has become popular. My experience with regard to complexity and C vs C++ is that C forces the complexity of dealing with resource lifetimes out into all the code everyone writes, while C++ allows one to encapsulate that sort of complexity into a class which is small and more easily verifiable. This is about code quality, and the best quality C++ code I've worked with has been way easier to program in than the best quality C code I've worked with. While I actually believe this to be true (very good C++ can be easier to read/use than very good C). Good C is also much more common than good C++, at least in open source. On the good C++ codebases you have been working on, could you rely on everybody being a very good C++ programmer ? Because this will most likely never happen for numpy. This is the crux of the argument from an organizational POV: the variance in C++ code quality is much more difficult to control. I have seen C++ code that is certainly much poorer and more complex than numpy, to a point where not much could be done to save the codebase. Can this possibly be extended to the following: How will Mark's (extensive) experience about performance and long-term consequences of design decisions be communicated to future developers? We not only want new numpy developers, we want them to write good code without unintentional performance regressions. It seems like something more than just code guidelines would be required. I've tried to set a bit of an example to start with the NEPs I've written. The NEPs for both the nditer and the NA functionality are very long and detailed. Some
Re: [Numpy-discussion] How a transition to C++ could work
Den 19. feb. 2012 kl. 09:51 skrev Stéfan van der Walt ste...@sun.ac.za: OK, so let's talk specifics: how do you dynamically grab a function pointer to a compiled C++ library, a la ctypes? Feel free to point me to StackOverflow or elsewhere. You declare the function with the signature extern C. Sturla___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Feb 19, 2012, at 2:18 AM, Mark Wiebe mwwi...@gmail.com wrote: The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: - The C subset of C++ is just as efficient as C. - C++ supports cleaner code in several significant cases. - C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. - C++ never requires uglier code. Some people have pointed out that the Python templating preprocessor used in NumPy is suggestive of C++ templates. A nice advantage of using C++ templates instead of this preprocessor is that third party tools to improve software quality, like static analysis tools, will be able to run directly on the NumPy source code. Additionally, IDEs like XCode and Visual C++ will be able to provide the full suite of tab-completion/intellisense features that programmers working in those environments are accustomed to. There are concerns about ABI/API interoperability and interactions with C++ exceptions. I've dealt with these types of issues on enough platforms to know that while they're important, they're a lot easier to handle than the issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that providing a C API from a C++ library is no harder than providing a C API from a C library. It's worth comparing the possibility of C++ versus the possibility of other languages, and the ones that have been suggested for consideration are D, Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language has to interact naturally with the CPython API. It needs to provide direct access to all the various sizes of signed int, unsigned int, and float. It needs to have mature compiler support wherever we want to deploy NumPy. Taken together, these requirements eliminate a majority of these possibilities. From these criteria, the only languages which seem to have a clear possibility for the implementation of Numpy are C, C++, and D. For D, I suspect the tooling is not mature enough, but I'm not 100% certain of that. I am a huge fan of D, but you are dead on about its tooling, so +1 on the observation. Its code generation especially with respect to floating point is also a known area needing improvement IIRC. The biggest question for any of these possibilities is how do you get the code from its current state to a state which fully utilizes the target language. C++, being nearly a superset of C, offers a strategy to gradually absorb C++ features. Any of the other language choices requires a rewrite, which would be quite disruptive. Because of all these reasons taken together, I believe the only realistic language to use, other than sticking with C, is C++. Finally, here's what I think is the best strategy for transitioning to C++. First, let's consider what we do if 1.7 becomes an LTS release. 1) Immediately after branching for 1.7, we minimally patch all the .c files so that they can build with a C++ compiler and with a C compiler at the same time. Then we rename all .c - .cpp, and update the build systems for C++. 2) During the 1.8 development cycle, we heavily restrict C++ feature usage. But, where a feature implementation would be arguably easier and less error-prone with C++, we allow it. This is a period for learning about C++ and how it can benefit NumPy. 3) After the 1.8 release, the community will have developed more experience with C++, and will be in a better position to discuss a way forward. If, for some reason, a 1.7 LTS is unacceptable, it might be a good idea to restrict the 1.8 release to the subset of both C and C++. I would much prefer using the 1.8 development cycle to dip our toes into the C++ world to get some of the low-hanging benefits without doing anything disruptive. A really important point to emphasize is that C++ allows for a strategy where we gradually evolve the codebase to better incorporate its language features. This is what I'm advocating. No massive rewrite, no disruptive changes. Gradual code evolution, with ABI and API compatibility comparable to what we've delivered in 1.6 and the upcoming 1.7 releases. Thanks, Mark ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
Den 19.02.2012 11:30, skrev Christopher Jordan-Squire: Can this possibly be extended to the following: How will Mark's (extensive) experience about performance and long-term consequences of design decisions be communicated to future developers? We not only want new numpy developers, we want them to write good code without unintentional performance regressions. It seems like something more than just code guidelines would be required. There are more examples of crappy than good C++ out there. There are tons of litterature on how to write crappy C++. And most programmers do not have the skill or knowledge to write good C++. My biggest issue with C++ is the variability of skills among programmers. It will result in code that are: - unncessesary complex - ugly looking - difficult to understand - verbose and long - inefficient - full of subtile errors - impossible to debug - impossible to maintain - not scalable with hardware - dependent on one particular compiler It is easier to achive this with C++ than C. But it is also easier to avoid. Double-edged sword. It will take more than guidelines. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
Den 19.02.2012 10:52, skrev Mark Wiebe: C++ removes some of this advantage -- now there is extra code generated by the compiler to handle constructors, destructors, operators etc which can make a material difference to fast inner loops. So you end up just writing C-style anyway. This is in fact not true, and writing in C++ style can often produce faster code. A classic example of this is C qsort vs C++ std::sort. You may be thinking of using virtual functions in a class hierarchy, where a tradeoff between performance and run-time polymorphism is being done. Emulating the functionality that virtual functions provide in C will give similar performance characteristics as the C++ language feature itself. I agree with Mark here. C++ usually produces the faster code. C++ has abstractions that makes it easier to write more efficient code. C++ provides more and better information to the compiler (e.g. strict aliasing rules). C++ compilers are also getting insanely good at optimisation, usually better than C compilers. But C++ also makes it easy to write sluggish bloatware, so the effect on performance is not predictable. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Feb 19, 2012, at 10:38 AM, Sturla Molden stu...@molden.no wrote: Den 19.02.2012 10:52, skrev Mark Wiebe: C++ removes some of this advantage -- now there is extra code generated by the compiler to handle constructors, destructors, operators etc which can make a material difference to fast inner loops. So you end up just writing C-style anyway. This is in fact not true, and writing in C++ style can often produce faster code. A classic example of this is C qsort vs C++ std::sort. You may be thinking of using virtual functions in a class hierarchy, where a tradeoff between performance and run-time polymorphism is being done. Emulating the functionality that virtual functions provide in C will give similar performance characteristics as the C++ language feature itself. I agree with Mark here. C++ usually produces the faster code. C++ has abstractions that makes it easier to write more efficient code. C++ provides more and better information to the compiler (e.g. strict aliasing rules). C++ compilers are also getting insanely good at optimisation, usually better than C compilers. But C++ also makes it easy to write sluggish bloatware, so the effect on performance is not predictable. Just to add, with respect to acceptable compilation times, a judicious choice of C++ features is critical. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
Den 19.02.2012 16:45, skrev Adam Klein: Just to add, with respect to acceptable compilation times, a judicious choice of C++ features is critical. I use Python to avoid recompiling my code all the time. I don't recompile NumPy every time I use it. (I know you are thinking about development, but you have the wrong perspective.) Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Sun, Feb 19, 2012 at 4:53 PM, Sturla Molden stu...@molden.no wrote: Den 19.02.2012 16:45, skrev Adam Klein: Just to add, with respect to acceptable compilation times, a judicious choice of C++ features is critical. I use Python to avoid recompiling my code all the time. I don't recompile NumPy every time I use it. (I know you are thinking about development, but you have the wrong perspective.) No he doesn't. Perspectives aren't wrong, just different. I compile both numpy and scipy on a regular (almost daily) basis, and long compile times are very annoying. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How a transition to C++ could work
On Feb 19, 2012 2:41 AM, Mark Wiebe mwwi...@gmail.com wrote: This is the role I see good coding standards and consistent code review playing. Programmers who don't know how to write good C++ code can be taught. There are also good books to read, like C++ Coding Standards, Effective C++, and others that can help people learn proper technique. I recommended this book (one in the list avove) to anyone who is not afraid of C++ yet: http://search.barnesandnoble.com/Effective-C/Scott-Meyers/e/9780321334879 With great power comes great responsibility. Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] How a transition to C++ could work
The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: - The C subset of C++ is just as efficient as C. - C++ supports cleaner code in several significant cases. - C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. - C++ never requires uglier code. Some people have pointed out that the Python templating preprocessor used in NumPy is suggestive of C++ templates. A nice advantage of using C++ templates instead of this preprocessor is that third party tools to improve software quality, like static analysis tools, will be able to run directly on the NumPy source code. Additionally, IDEs like XCode and Visual C++ will be able to provide the full suite of tab-completion/intellisense features that programmers working in those environments are accustomed to. There are concerns about ABI/API interoperability and interactions with C++ exceptions. I've dealt with these types of issues on enough platforms to know that while they're important, they're a lot easier to handle than the issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that providing a C API from a C++ library is no harder than providing a C API from a C library. It's worth comparing the possibility of C++ versus the possibility of other languages, and the ones that have been suggested for consideration are D, Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language has to interact naturally with the CPython API. It needs to provide direct access to all the various sizes of signed int, unsigned int, and float. It needs to have mature compiler support wherever we want to deploy NumPy. Taken together, these requirements eliminate a majority of these possibilities. From these criteria, the only languages which seem to have a clear possibility for the implementation of Numpy are C, C++, and D. For D, I suspect the tooling is not mature enough, but I'm not 100% certain of that. The biggest question for any of these possibilities is how do you get the code from its current state to a state which fully utilizes the target language. C++, being nearly a superset of C, offers a strategy to gradually absorb C++ features. Any of the other language choices requires a rewrite, which would be quite disruptive. Because of all these reasons taken together, I believe the only realistic language to use, other than sticking with C, is C++. Finally, here's what I think is the best strategy for transitioning to C++. First, let's consider what we do if 1.7 becomes an LTS release. 1) Immediately after branching for 1.7, we minimally patch all the .c files so that they can build with a C++ compiler and with a C compiler at the same time. Then we rename all .c - .cpp, and update the build systems for C++. 2) During the 1.8 development cycle, we heavily restrict C++ feature usage. But, where a feature implementation would be arguably easier and less error-prone with C++, we allow it. This is a period for learning about C++ and how it can benefit NumPy. 3) After the 1.8 release, the community will have developed more experience with C++, and will be in a better position to discuss a way forward. If, for some reason, a 1.7 LTS is unacceptable, it might be a good idea to restrict the 1.8 release to the subset of both C and C++. I would much prefer using the 1.8 development cycle to dip our toes into the C++ world to get some of the low-hanging benefits without doing anything disruptive. A really important point to emphasize is that C++ allows for a strategy where we gradually evolve the codebase to better incorporate its language features. This is what I'm advocating. No massive rewrite, no disruptive changes. Gradual code evolution, with ABI and API compatibility comparable to what we've delivered in 1.6 and the upcoming 1.7 releases. Thanks, Mark ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion