Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Stéfan van der Walt
Hey, Mark

On Feb 18, 2012 11:18 PM, Mark Wiebe mwwi...@gmail.com wrote:
 My experience has been that providing a C API from a C++ library is no
harder than providing a C API from a C library.

Interfacing to compiled C++ libs have been tricky, so can this concern be
dismissed so easily? (Some examples that came to mind were
_import_array--easy to fix because it is ours, I guess--or Cython generated
code).

 A really important point to emphasize is that C++ allows for a strategy
where we gradually evolve the codebase to better incorporate its language
features. This is what I'm advocating. No massive rewrite, no disruptive
changes. Gradual code evolution, with ABI and API compatibility comparable
to what we've delivered in 1.6 and the upcoming 1.7 releases.

If we're to switch to C++ (a language that can very easily be wielded in
terrible ways), then this certainly seems like a sound approach.

Regards
Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Matthew Brett
Hi,

Thanks for this - it's very helpful.

On Sat, Feb 18, 2012 at 11:18 PM, Mark Wiebe mwwi...@gmail.com wrote:
 The suggestion of transitioning the NumPy core code from C to C++ has
 sparked a vigorous debate, and I thought I'd start a new thread to give my
 perspective on some of the issues raised, and describe how such a transition
 could occur.

 First, I'd like to reiterate the gcc rationale for their choice to switch:
 http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale

 In particular, these points deserve emphasis:

 The C subset of C++ is just as efficient as C.
 C++ supports cleaner code in several significant cases.
 C++ makes it easier to write cleaner interfaces by making it harder to break
 interface boundaries.
 C++ never requires uglier code.

 Some people have pointed out that the Python templating preprocessor used in
 NumPy is suggestive of C++ templates. A nice advantage of using C++
 templates instead of this preprocessor is that third party tools to improve
 software quality, like static analysis tools, will be able to run directly
 on the NumPy source code. Additionally, IDEs like XCode and Visual C++ will
 be able to provide the full suite of tab-completion/intellisense features
 that programmers working in those environments are accustomed to.

 There are concerns about ABI/API interoperability and interactions with C++
 exceptions. I've dealt with these types of issues on enough platforms to
 know that while they're important, they're a lot easier to handle than the
 issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that
 providing a C API from a C++ library is no harder than providing a C API
 from a C library.

 It's worth comparing the possibility of C++ versus the possibility of other
 languages, and the ones that have been suggested for consideration are D,
 Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language
 has to interact naturally with the CPython API. It needs to provide direct
 access to all the various sizes of signed int, unsigned int, and float. It
 needs to have mature compiler support wherever we want to deploy NumPy.
 Taken together, these requirements eliminate a majority of these
 possibilities. From these criteria, the only languages which seem to have a
 clear possibility for the implementation of Numpy are C, C++, and D.

On which criteria did you eliminate Cython?

 The biggest question for any of these possibilities is how do you get the
 code from its current state to a state which fully utilizes the target
 language. C++, being nearly a superset of C, offers a strategy to gradually
 absorb C++ features. Any of the other language choices requires a rewrite,
 which would be quite disruptive. Because of all these reasons taken
 together, I believe the only realistic language to use, other than sticking
 with C, is C++.

 Finally, here's what I think is the best strategy for transitioning to C++.
 First, let's consider what we do if 1.7 becomes an LTS release.

 1) Immediately after branching for 1.7, we minimally patch all the .c files
 so that they can build with a C++ compiler and with a C compiler at the same
 time. Then we rename all .c - .cpp, and update the build systems for C++.
 2) During the 1.8 development cycle, we heavily restrict C++ feature usage.
 But, where a feature implementation would be arguably easier and less
 error-prone with C++, we allow it. This is a period for learning about C++
 and how it can benefit NumPy.
 3) After the 1.8 release, the community will have developed more experience
 with C++, and will be in a better position to discuss a way forward.

 If, for some reason, a 1.7 LTS is unacceptable, it might be a good idea to
 restrict the 1.8 release to the subset of both C and C++. I would much
 prefer using the 1.8 development cycle to dip our toes into the C++ world to
 get some of the low-hanging benefits without doing anything disruptive.

 A really important point to emphasize is that C++ allows for a strategy
 where we gradually evolve the codebase to better incorporate its language
 features. This is what I'm advocating. No massive rewrite, no disruptive
 changes. Gradual code evolution, with ABI and API compatibility comparable
 to what we've delivered in 1.6 and the upcoming 1.7 releases.

Do you have any comment on the need for coding standards when using
C++?  I saw the warning in:

http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale

about using C++ unwisely.

See you,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Mark Wiebe
On Sun, Feb 19, 2012 at 2:24 AM, Stéfan van der Walt ste...@sun.ac.zawrote:

 Hey, Mark

 On Feb 18, 2012 11:18 PM, Mark Wiebe mwwi...@gmail.com wrote:
  My experience has been that providing a C API from a C++ library is no
 harder than providing a C API from a C library.

 Interfacing to compiled C++ libs have been tricky, so can this concern be
 dismissed so easily? (Some examples that came to mind were
 _import_array--easy to fix because it is ours, I guess--or Cython generated
 code).

I'm speaking from personal experience having dealt with these types of
issues extensively before. If people have more detailed examples of
problems, possibly links to discussions where one of these problems has
occurred, that would be helpful. This argument isn't very useful if it's
just my positive experience versus others negative experience, we need to
get into specifics to advance the discussion.

-Mark

  A really important point to emphasize is that C++ allows for a strategy
 where we gradually evolve the codebase to better incorporate its language
 features. This is what I'm advocating. No massive rewrite, no disruptive
 changes. Gradual code evolution, with ABI and API compatibility comparable
 to what we've delivered in 1.6 and the upcoming 1.7 releases.

 If we're to switch to C++ (a language that can very easily be wielded in
 terrible ways), then this certainly seems like a sound approach.

 Regards
 Stéfan

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Mark Wiebe
On Sun, Feb 19, 2012 at 2:32 AM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 Thanks for this - it's very helpful.

 On Sat, Feb 18, 2012 at 11:18 PM, Mark Wiebe mwwi...@gmail.com wrote:
  The suggestion of transitioning the NumPy core code from C to C++ has
  sparked a vigorous debate, and I thought I'd start a new thread to give
 my
  perspective on some of the issues raised, and describe how such a
 transition
  could occur.
 
  First, I'd like to reiterate the gcc rationale for their choice to
 switch:
  http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
 
  In particular, these points deserve emphasis:
 
  The C subset of C++ is just as efficient as C.
  C++ supports cleaner code in several significant cases.
  C++ makes it easier to write cleaner interfaces by making it harder to
 break
  interface boundaries.
  C++ never requires uglier code.
 
  Some people have pointed out that the Python templating preprocessor
 used in
  NumPy is suggestive of C++ templates. A nice advantage of using C++
  templates instead of this preprocessor is that third party tools to
 improve
  software quality, like static analysis tools, will be able to run
 directly
  on the NumPy source code. Additionally, IDEs like XCode and Visual C++
 will
  be able to provide the full suite of tab-completion/intellisense features
  that programmers working in those environments are accustomed to.
 
  There are concerns about ABI/API interoperability and interactions with
 C++
  exceptions. I've dealt with these types of issues on enough platforms to
  know that while they're important, they're a lot easier to handle than
 the
  issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been
 that
  providing a C API from a C++ library is no harder than providing a C API
  from a C library.
 
  It's worth comparing the possibility of C++ versus the possibility of
 other
  languages, and the ones that have been suggested for consideration are D,
  Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language
  has to interact naturally with the CPython API. It needs to provide
 direct
  access to all the various sizes of signed int, unsigned int, and float.
 It
  needs to have mature compiler support wherever we want to deploy NumPy.
  Taken together, these requirements eliminate a majority of these
  possibilities. From these criteria, the only languages which seem to
 have a
  clear possibility for the implementation of Numpy are C, C++, and D.

 On which criteria did you eliminate Cython?


The mature compiler support one. As glue between C/C++ and Python, it
looks great, but Dag's evaluation of Cython's maturity for implementing the
style of functionality in NumPy seems pretty authoritative. So people don't
have to dig through the giant email thread, here's the specific message
content from Dag, and it's context:

On 02/18/2012 12:35 PM, Charles R Harris wrote:

 No one in their right mind would build a large performance library using
 Cython, it just isn't the right tool. For what it was designed for -
 wrapping existing c code or writing small and simple things close to
 Python - it does very well, but it was never designed for making core
 C/C++ libraries and in that role it just gets in the way.

+1. Even I who have contributed to Cython realize this; last autumn I
implemented a library by writing it in C and wrapping it in Cython.



  The biggest question for any of these possibilities is how do you get the
  code from its current state to a state which fully utilizes the target
  language. C++, being nearly a superset of C, offers a strategy to
 gradually
  absorb C++ features. Any of the other language choices requires a
 rewrite,
  which would be quite disruptive. Because of all these reasons taken
  together, I believe the only realistic language to use, other than
 sticking
  with C, is C++.
 
  Finally, here's what I think is the best strategy for transitioning to
 C++.
  First, let's consider what we do if 1.7 becomes an LTS release.
 
  1) Immediately after branching for 1.7, we minimally patch all the .c
 files
  so that they can build with a C++ compiler and with a C compiler at the
 same
  time. Then we rename all .c - .cpp, and update the build systems for
 C++.
  2) During the 1.8 development cycle, we heavily restrict C++ feature
 usage.
  But, where a feature implementation would be arguably easier and less
  error-prone with C++, we allow it. This is a period for learning about
 C++
  and how it can benefit NumPy.
  3) After the 1.8 release, the community will have developed more
 experience
  with C++, and will be in a better position to discuss a way forward.
 
  If, for some reason, a 1.7 LTS is unacceptable, it might be a good idea
 to
  restrict the 1.8 release to the subset of both C and C++. I would much
  prefer using the 1.8 development cycle to dip our toes into the C++
 world to
  get some of the low-hanging benefits without doing anything disruptive.
 
  A really important 

Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread David Cournapeau
Hi Mark,

thank you for joining this discussion.

On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe mwwi...@gmail.com wrote:
 The suggestion of transitioning the NumPy core code from C to C++ has
 sparked a vigorous debate, and I thought I'd start a new thread to give my
 perspective on some of the issues raised, and describe how such a transition
 could occur.

 First, I'd like to reiterate the gcc rationale for their choice to switch:
 http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale

 In particular, these points deserve emphasis:

 The C subset of C++ is just as efficient as C.
 C++ supports cleaner code in several significant cases.
 C++ makes it easier to write cleaner interfaces by making it harder to break
 interface boundaries.
 C++ never requires uglier code.

I think those arguments will not be very useful: they are subjective,
and unlikely to convince people who prefer C to C++.


 There are concerns about ABI/API interoperability and interactions with C++
 exceptions. I've dealt with these types of issues on enough platforms to
 know that while they're important, they're a lot easier to handle than the
 issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that
 providing a C API from a C++ library is no harder than providing a C API
 from a C library.

This needs more details. I have some experience in both areas as well,
and mine is quite different. Reiterating a few examples that worry me:
  - how can you ensure that exceptions happening in C++ will never
cross different .so/.dll ? How can one make sure C++ extensions built
by different compilers can work ? Is not using exceptions like it is
done in zeromq acceptable ? (would be nice to find out more about the
decisions made by the zeromq team about their usage of C++). I cannot
find a recent example, but I have seen errors similar to
this(http://software.intel.com/en-us/forums/showthread.php?t=42940)
quite a few times.
  - how can you expose in C some heavily-using C++ features ? I would
expect you would like to use templates for iterators in numpy - you
can you make them available to 3rd party extensions without requiring
C++.


 It's worth comparing the possibility of C++ versus the possibility of other
 languages, and the ones that have been suggested for consideration are D,
 Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language
 has to interact naturally with the CPython API. It needs to provide direct
 access to all the various sizes of signed int, unsigned int, and float. It
 needs to have mature compiler support wherever we want to deploy NumPy.
 Taken together, these requirements eliminate a majority of these
 possibilities. From these criteria, the only languages which seem to have a
 clear possibility for the implementation of Numpy are C, C++, and D. For D,
 I suspect the tooling is not mature enough, but I'm not 100% certain of
 that.

While I agree that no other language is realistic, staying in C has
the nice advantage that we can more easily use one of them if they
mature (rust/D - go, rpython, C#/java can be dismissed for fundamental
technical reasons right away). This is not a very strong argument
against using C++, obviously.


 1) Immediately after branching for 1.7, we minimally patch all the .c files
 so that they can build with a C++ compiler and with a C compiler at the same
 time. Then we rename all .c - .cpp, and update the build systems for C++.
 2) During the 1.8 development cycle, we heavily restrict C++ feature usage.
 But, where a feature implementation would be arguably easier and less
 error-prone with C++, we allow it. This is a period for learning about C++
 and how it can benefit NumPy.
 3) After the 1.8 release, the community will have developed more experience
 with C++, and will be in a better position to discuss a way forward.

A step that would be useful sooner rather than later is one where
numpy has been split into smaller extensions (instead of
multiarray/ufunc, essentially). This would help avoiding recompilation
of lots of code for any small change. It is already quite painful with
C, but with C++, it will be unbearable. This can be done in C, and
would be useful whether the decision to move to C++ is accepted or
not.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Mark Wiebe
On Sun, Feb 19, 2012 at 2:51 AM, Stéfan van der Walt ste...@sun.ac.zawrote:


 On Feb 19, 2012 12:34 AM, Mark Wiebe mwwi...@gmail.com wrote:
 
  I'm speaking from personal experience having dealt with these types of
 issues extensively before. If people have more detailed examples of
 problems, possibly links to discussions where one of these problems has
 occurred, that would be helpful. This argument isn't very useful if it's
 just my positive experience versus others negative experience, we need to
 get into specifics to advance the discussion.

 Wow, the NumPy list has gotten so serious :) I'm certainly not doubting
 anyone's experience--just trying to get a handle on possible transition
 risks.

 Heh, when threads get longer than 50 message, I think that's a sign
something is serious!

 OK, so let's talk specifics: how do you dynamically grab a function
 pointer to a compiled C++ library, a la ctypes? Feel free to point me to
 StackOverflow or elsewhere.

If the C++ library is exposing a C-API, it's identical to the case for C.

If it's not, and you must access the functions via ctypes anyway, you need
to determine the mangled name of the function. The mangled name encodes the
types of the parameters, to support function polymorphism, and is different
for each OS platform. Also, if the function takes a class object as a
parameter, or returns one, ctypes doesn't give you a way to forward that
parameter.

In general, the standard advice is to wrap the C++ library using
Boost.Python, Cython, or something similar. Dealing directly with the
mangled names, while possible, is not likely to make you happy.

Cheers,
Mark

  Stéfan

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Matthew Brett
Hi,

On Sun, Feb 19, 2012 at 12:49 AM, Mark Wiebe mwwi...@gmail.com wrote:
 On Sun, Feb 19, 2012 at 2:32 AM, Matthew Brett matthew.br...@gmail.com
 wrote:

 Hi,

 Thanks for this - it's very helpful.

 On Sat, Feb 18, 2012 at 11:18 PM, Mark Wiebe mwwi...@gmail.com wrote:
  The suggestion of transitioning the NumPy core code from C to C++ has
  sparked a vigorous debate, and I thought I'd start a new thread to give
  my
  perspective on some of the issues raised, and describe how such a
  transition
  could occur.
 
  First, I'd like to reiterate the gcc rationale for their choice to
  switch:
  http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
 
  In particular, these points deserve emphasis:
 
  The C subset of C++ is just as efficient as C.
  C++ supports cleaner code in several significant cases.
  C++ makes it easier to write cleaner interfaces by making it harder to
  break
  interface boundaries.
  C++ never requires uglier code.
 
  Some people have pointed out that the Python templating preprocessor
  used in
  NumPy is suggestive of C++ templates. A nice advantage of using C++
  templates instead of this preprocessor is that third party tools to
  improve
  software quality, like static analysis tools, will be able to run
  directly
  on the NumPy source code. Additionally, IDEs like XCode and Visual C++
  will
  be able to provide the full suite of tab-completion/intellisense
  features
  that programmers working in those environments are accustomed to.
 
  There are concerns about ABI/API interoperability and interactions with
  C++
  exceptions. I've dealt with these types of issues on enough platforms to
  know that while they're important, they're a lot easier to handle than
  the
  issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been
  that
  providing a C API from a C++ library is no harder than providing a C API
  from a C library.
 
  It's worth comparing the possibility of C++ versus the possibility of
  other
  languages, and the ones that have been suggested for consideration are
  D,
  Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target
  language
  has to interact naturally with the CPython API. It needs to provide
  direct
  access to all the various sizes of signed int, unsigned int, and float.
  It
  needs to have mature compiler support wherever we want to deploy NumPy.
  Taken together, these requirements eliminate a majority of these
  possibilities. From these criteria, the only languages which seem to
  have a
  clear possibility for the implementation of Numpy are C, C++, and D.

 On which criteria did you eliminate Cython?


 The mature compiler support one.

I took you to mean that the code would compile on any platform.  As
Cython compiles to C, I think Cython passes, if that is what you
meant.  Maybe you meant you thought that Cython was not mature in some
sense, but if so, I'm not sure which sense you mean.

 As glue between C/C++ and Python, it
 looks great, but Dag's evaluation of Cython's maturity for implementing the
 style of functionality in NumPy seems pretty authoritative. So people don't
 have to dig through the giant email thread, here's the specific message
 content from Dag, and it's context:

 On 02/18/2012 12:35 PM, Charles R Harris wrote:

 No one in their right mind would build a large performance library using
 Cython, it just isn't the right tool. For what it was designed for -
 wrapping existing c code or writing small and simple things close to
 Python - it does very well, but it was never designed for making core
 C/C++ libraries and in that role it just gets in the way.

 +1. Even I who have contributed to Cython realize this; last autumn I
 implemented a library by writing it in C and wrapping it in Cython.

As you probably saw, I think the proposal was indeed to use Cython to
provide the higher-level parts of the core, while refactoring the rest
of the C code underneath it.  Obviously one could also refactor the C
into C++, so the proposal to use Cython for some of the core is to
some extent orthogonal to the choice of C / C++.I don't know the
core, perhaps there isn't much of it that would benefit from being in
Cython, I'd be interested to know your views.  But, superficially, it
seems like an attractive solution to making (some of) the core easier
to maintain.

Best,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Ben Walsh


 Date: Sun, 19 Feb 2012 01:18:20 -0600
 From: Mark Wiebe mwwi...@gmail.com
 Subject: [Numpy-discussion] How a transition to C++ could work
 To: Discussion of Numerical Python NumPy-Discussion@scipy.org
 Message-ID:
   CAMRnEmpVTmt=kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com
 Content-Type: text/plain; charset=utf-8

 The suggestion of transitioning the NumPy core code from C to C++ has
 sparked a vigorous debate, and I thought I'd start a new thread to give my
 perspective on some of the issues raised, and describe how such a
 transition could occur.

 First, I'd like to reiterate the gcc rationale for their choice to switch:
 http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale

 In particular, these points deserve emphasis:

   - The C subset of C++ is just as efficient as C.
   - C++ supports cleaner code in several significant cases.
   - C++ makes it easier to write cleaner interfaces by making it harder to
   break interface boundaries.
   - C++ never requires uglier code.


I think they're trying to solve a different problem.

I thought the problem that numpy was trying to solve is make inner loops 
of numerical algorithms very fast. C is great for this because you can 
write C code and picture precisely what assembly code will be generated.

C++ removes some of this advantage -- now there is extra code generated by 
the compiler to handle constructors, destructors, operators etc which can 
make a material difference to fast inner loops. So you end up just writing 
C-style anyway.

On the other hand, if your problem really is write lots of OO code with 
virtual methods and have it turned into machine code (probably like the 
GCC guys) then maybe C++ is the way to go.

Some more opinions on C++: 
http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/

Sorry if this all seems a bit negative about C++. It's just been my 
experience that C++ adds complexity while C keeps things nice and simple.

Looking forward to seeing some more concrete examples.

Cheers

Ben
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Mark Wiebe
On Sun, Feb 19, 2012 at 2:56 AM, David Cournapeau courn...@gmail.comwrote:

 Hi Mark,

 thank you for joining this discussion.

 On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe mwwi...@gmail.com wrote:
  The suggestion of transitioning the NumPy core code from C to C++ has
  sparked a vigorous debate, and I thought I'd start a new thread to give
 my
  perspective on some of the issues raised, and describe how such a
 transition
  could occur.
 
  First, I'd like to reiterate the gcc rationale for their choice to
 switch:
  http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
 
  In particular, these points deserve emphasis:
 
  The C subset of C++ is just as efficient as C.
  C++ supports cleaner code in several significant cases.
  C++ makes it easier to write cleaner interfaces by making it harder to
 break
  interface boundaries.
  C++ never requires uglier code.

 I think those arguments will not be very useful: they are subjective,
 and unlikely to convince people who prefer C to C++.


They are arguments from a team which implement both a C and a C++ compiler.
In the spectrum of possible authorities on the matter, they rate about as
high as I can imagine.


 
  There are concerns about ABI/API interoperability and interactions with
 C++
  exceptions. I've dealt with these types of issues on enough platforms to
  know that while they're important, they're a lot easier to handle than
 the
  issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been
 that
  providing a C API from a C++ library is no harder than providing a C API
  from a C library.

 This needs more details. I have some experience in both areas as well,
 and mine is quite different. Reiterating a few examples that worry me:
  - how can you ensure that exceptions happening in C++ will never
 cross different .so/.dll ?


This is a necessary part of providing a C API, and is included as a
requirement of doing that. All C++ libraries which expose a C API deal with
this.


 How can one make sure C++ extensions built
 by different compilers can work ?


This is no different from the situation in C. Already in C on Windows, one
can't build NumPy with a different version of Visual C++ than the one used
to build CPython.


 Is not using exceptions like it is
 done in zeromq acceptable ? (would be nice to find out more about the
 decisions made by the zeromq team about their usage of C++).


I prefer to use exceptions in C++, but some major projects have decided to
disable them. LLVM/Clang is the most notable example. My experience working
with high-performance graphics code has been that appropriate use of
exceptions (i.e. not doing something like using them for control flow) do
not pose a problem.

I cannot
 find a recent example, but I have seen errors similar to
 this(http://software.intel.com/en-us/forums/showthread.php?t=42940)
 quite a few times.


This kind of thing would happen when using 'new' to allocate memory, and
with the compiler setting enabled to raise bad_alloc on such allocation
failures (the default for most compilers nowadays). If exception handling
is disabled in the compiler, new will return NULL instead. Unless the
compiler has a bizarre issue, catching either std::exception or
std::bad_alloc specifically within NumPy should be sufficient to deal with
it. Also note that the possibility of something like this will only arise
once more advanced C++ features are being adopted.

 - how can you expose in C some heavily-using C++ features ?


If the advantages of those C++ features depend on the C++ language, you
have to map them to a limited subset of the feature in C. For example, if a
feature is based on a C++ template, you can instantiate specific instances
of the template for all the types you want to support from C.


 I would
 expect you would like to use templates for iterators in numpy - you
 can you make them available to 3rd party extensions without requiring
 C++.


Yes, something like the nditer is a good example. From C, it would have to
retain an API in the current style, but C++ users could gain an
easier-to-use variant.



 
  It's worth comparing the possibility of C++ versus the possibility of
 other
  languages, and the ones that have been suggested for consideration are D,
  Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language
  has to interact naturally with the CPython API. It needs to provide
 direct
  access to all the various sizes of signed int, unsigned int, and float.
 It
  needs to have mature compiler support wherever we want to deploy NumPy.
  Taken together, these requirements eliminate a majority of these
  possibilities. From these criteria, the only languages which seem to
 have a
  clear possibility for the implementation of Numpy are C, C++, and D. For
 D,
  I suspect the tooling is not mature enough, but I'm not 100% certain of
  that.

 While I agree that no other language is realistic, staying in C has
 the nice advantage that we can more easily use one of them if 

Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Mark Wiebe
On Sun, Feb 19, 2012 at 3:10 AM, Matthew Brett matthew.br...@gmail.comwrote:

 snip
 As you probably saw, I think the proposal was indeed to use Cython to
 provide the higher-level parts of the core, while refactoring the rest
 of the C code underneath it.  Obviously one could also refactor the C
 into C++, so the proposal to use Cython for some of the core is to
 some extent orthogonal to the choice of C / C++.I don't know the
 core, perhaps there isn't much of it that would benefit from being in
 Cython, I'd be interested to know your views.  But, superficially, it
 seems like an attractive solution to making (some of) the core easier
 to maintain.


Using Cython in the binding role is orthogonal to the choice of C versus
C++, you are right. This binding aspect isn't the part where C++ provides
most of the benefits I envision, so increasing (or decreasing) the use of
Cython within NumPy seems like a good topic for a separate thread just
about Cython.

Cheers,
Mark



 Best,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread David Cournapeau
On Sun, Feb 19, 2012 at 9:19 AM, Mark Wiebe mwwi...@gmail.com wrote:
 On Sun, Feb 19, 2012 at 2:56 AM, David Cournapeau courn...@gmail.com
 wrote:

 Hi Mark,

 thank you for joining this discussion.

 On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe mwwi...@gmail.com wrote:
  The suggestion of transitioning the NumPy core code from C to C++ has
  sparked a vigorous debate, and I thought I'd start a new thread to give
  my
  perspective on some of the issues raised, and describe how such a
  transition
  could occur.
 
  First, I'd like to reiterate the gcc rationale for their choice to
  switch:
  http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
 
  In particular, these points deserve emphasis:
 
  The C subset of C++ is just as efficient as C.
  C++ supports cleaner code in several significant cases.
  C++ makes it easier to write cleaner interfaces by making it harder to
  break
  interface boundaries.
  C++ never requires uglier code.

 I think those arguments will not be very useful: they are subjective,
 and unlikely to convince people who prefer C to C++.


 They are arguments from a team which implement both a C and a C++ compiler.
 In the spectrum of possible authorities on the matter, they rate about as
 high as I can imagine.

There are quite a few arguments who are as authoritative and think
those arguments are not very strong. They are as unlikely to change
your mind as the gcc's arguments are unlikely to convince me I am
afraid.


 This is a necessary part of providing a C API, and is included as a
 requirement of doing that. All C++ libraries which expose a C API deal with
 this.

The only two given examples given so far for a C library around C++
code (clang and zeromq) do not use exceptions. Can you provide an
example of a C++ library that has a C API and does use exception ?

If not, I would like to know the technical details if you don't mind
expanding on them.



 How can one make sure C++ extensions built
 by different compilers can work ?


 This is no different from the situation in C. Already in C on Windows, one
 can't build NumPy with a different version of Visual C++ than the one used
 to build CPython.

This is a different situation. On windows, the mismatch between VS is
due to the way win32 has been used by python itself - it could
actually be fixed eventually by python (there are efforts in that
regard). It is not a language issue.

Except for that case, numpy has a pretty good record of allowing
people to mix and match compilers. Using mingw on windows and intel
compilers on linux are the typical cases, but not the only ones.


 I would
 expect you would like to use templates for iterators in numpy - you
 can you make them available to 3rd party extensions without requiring
 C++.


 Yes, something like the nditer is a good example. From C, it would have to
 retain an API in the current style, but C++ users could gain an
 easier-to-use variant.

Providing an official C++ library on top of the current C API would
certainly be nice for people who prefer C++ to C. But this is quite
different from using C++ at the core.

The current way iterators work would be very hard (if at all possible
?) to rewrite in idiomatic in C++ while keeping even API compatibility
with the existing C one. For numpy 2.0, we can somehow relax on this.
If it is not too time consuming, could you show a simplified example
of how it would work to write the iterator in C++ while providing a C
API in the spirit of what we have now ?

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Mark Wiebe
On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh ben_w_...@yahoo.co.uk wrote:



  Date: Sun, 19 Feb 2012 01:18:20 -0600
  From: Mark Wiebe mwwi...@gmail.com
  Subject: [Numpy-discussion] How a transition to C++ could work
  To: Discussion of Numerical Python NumPy-Discussion@scipy.org
  Message-ID:
CAMRnEmpVTmt=
 kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com
  Content-Type: text/plain; charset=utf-8
 
  The suggestion of transitioning the NumPy core code from C to C++ has
  sparked a vigorous debate, and I thought I'd start a new thread to give
 my
  perspective on some of the issues raised, and describe how such a
  transition could occur.
 
  First, I'd like to reiterate the gcc rationale for their choice to
 switch:
  http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
 
  In particular, these points deserve emphasis:
 
- The C subset of C++ is just as efficient as C.
- C++ supports cleaner code in several significant cases.
- C++ makes it easier to write cleaner interfaces by making it harder
 to
break interface boundaries.
- C++ never requires uglier code.
 

 I think they're trying to solve a different problem.

 I thought the problem that numpy was trying to solve is make inner loops
 of numerical algorithms very fast. C is great for this because you can
 write C code and picture precisely what assembly code will be generated.


What you're describing is also the C subset of C++, so your experience
applies just as well to C++!


 C++ removes some of this advantage -- now there is extra code generated by
 the compiler to handle constructors, destructors, operators etc which can
 make a material difference to fast inner loops. So you end up just writing
 C-style anyway.


This is in fact not true, and writing in C++ style can often produce faster
code. A classic example of this is C qsort vs C++ std::sort. You may be
thinking of using virtual functions in a class hierarchy, where a tradeoff
between performance and run-time polymorphism is being done. Emulating the
functionality that virtual functions provide in C will give similar
performance characteristics as the C++ language feature itself.


 On the other hand, if your problem really is write lots of OO code with
 virtual methods and have it turned into machine code (probably like the
 GCC guys) then maybe C++ is the way to go.


Managing the complexity of the dtype subsystem, the ufunc subsystem, the
nditer component, and other parts of NumPy could benefit from C++ Not in a
stereotypical OO code with virtual methods way, that is not how typical
modern C++ is done.


 Some more opinions on C++:
 http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/

 Sorry if this all seems a bit negative about C++. It's just been my
 experience that C++ adds complexity while C keeps things nice and simple.


Yes, there are lots of negative opinions about C++ out there, it's true.
Just like there are negative opinions about C, Java, C#, and any other
language which has become popular. My experience with regard to complexity
and C vs C++ is that C forces the complexity of dealing with resource
lifetimes out into all the code everyone writes, while C++ allows one to
encapsulate that sort of complexity into a class which is small and more
easily verifiable. This is about code quality, and the best quality C++
code I've worked with has been way easier to program in than the best
quality C code I've worked with.

Looking forward to seeing some more concrete examples.


In the interests of starting small, here's one that I mentioned in the
other thread:

Consider a regression like this:
http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html

Fixing this in C would require switching all the relevant usages of
NPY_MAXARGS to use a dynamic memory allocation. This brings with it the
potential of easily introducing a memory leak, and is a lot of work to do.
In C++, this functionality could be placed inside a class, where the
deterministic construction/destruction semantics eliminate the risk of
memory leaks and make the code easier to read at the same time.


Cheers,
Mark


 Cheers

 Ben
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread David Cournapeau
On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe mwwi...@gmail.com wrote:
 On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh ben_w_...@yahoo.co.uk wrote:



  Date: Sun, 19 Feb 2012 01:18:20 -0600
  From: Mark Wiebe mwwi...@gmail.com
  Subject: [Numpy-discussion] How a transition to C++ could work
  To: Discussion of Numerical Python NumPy-Discussion@scipy.org
  Message-ID:
 
  CAMRnEmpVTmt=kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com
  Content-Type: text/plain; charset=utf-8
 
  The suggestion of transitioning the NumPy core code from C to C++ has
  sparked a vigorous debate, and I thought I'd start a new thread to give
  my
  perspective on some of the issues raised, and describe how such a
  transition could occur.
 
  First, I'd like to reiterate the gcc rationale for their choice to
  switch:
  http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
 
  In particular, these points deserve emphasis:
 
    - The C subset of C++ is just as efficient as C.
    - C++ supports cleaner code in several significant cases.
    - C++ makes it easier to write cleaner interfaces by making it harder
  to
    break interface boundaries.
    - C++ never requires uglier code.
 

 I think they're trying to solve a different problem.

 I thought the problem that numpy was trying to solve is make inner loops
 of numerical algorithms very fast. C is great for this because you can
 write C code and picture precisely what assembly code will be generated.


 What you're describing is also the C subset of C++, so your experience
 applies just as well to C++!


 C++ removes some of this advantage -- now there is extra code generated by
 the compiler to handle constructors, destructors, operators etc which can
 make a material difference to fast inner loops. So you end up just writing
 C-style anyway.


 This is in fact not true, and writing in C++ style can often produce faster
 code. A classic example of this is C qsort vs C++ std::sort. You may be
 thinking of using virtual functions in a class hierarchy, where a tradeoff
 between performance and run-time polymorphism is being done. Emulating the
 functionality that virtual functions provide in C will give similar
 performance characteristics as the C++ language feature itself.


 On the other hand, if your problem really is write lots of OO code with
 virtual methods and have it turned into machine code (probably like the
 GCC guys) then maybe C++ is the way to go.


 Managing the complexity of the dtype subsystem, the ufunc subsystem, the
 nditer component, and other parts of NumPy could benefit from C++ Not in a
 stereotypical OO code with virtual methods way, that is not how typical
 modern C++ is done.


 Some more opinions on C++:
 http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/

 Sorry if this all seems a bit negative about C++. It's just been my
 experience that C++ adds complexity while C keeps things nice and simple.


 Yes, there are lots of negative opinions about C++ out there, it's true.
 Just like there are negative opinions about C, Java, C#, and any other
 language which has become popular. My experience with regard to complexity
 and C vs C++ is that C forces the complexity of dealing with resource
 lifetimes out into all the code everyone writes, while C++ allows one to
 encapsulate that sort of complexity into a class which is small and more
 easily verifiable. This is about code quality, and the best quality C++ code
 I've worked with has been way easier to program in than the best quality C
 code I've worked with.

While I actually believe this to be true (very good C++ can be easier
to read/use than very good C). Good C is also much more common than
good C++, at least in open source.

On the good C++ codebases you have been working on, could you rely on
everybody being a very good C++ programmer ? Because this will most
likely never happen for numpy. This is the crux of the argument from
an organizational POV: the variance in C++ code quality is much more
difficult to control. I have seen C++ code that is certainly much
poorer and more complex than numpy, to a point where not much could be
done to save the codebase.

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Mark Wiebe
On Sun, Feb 19, 2012 at 3:45 AM, David Cournapeau courn...@gmail.comwrote:

 On Sun, Feb 19, 2012 at 9:19 AM, Mark Wiebe mwwi...@gmail.com wrote:
  On Sun, Feb 19, 2012 at 2:56 AM, David Cournapeau courn...@gmail.com
  wrote:
 
  Hi Mark,
 
  thank you for joining this discussion.
 
  On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe mwwi...@gmail.com wrote:
   The suggestion of transitioning the NumPy core code from C to C++ has
   sparked a vigorous debate, and I thought I'd start a new thread to
 give
   my
   perspective on some of the issues raised, and describe how such a
   transition
   could occur.
  
   First, I'd like to reiterate the gcc rationale for their choice to
   switch:
   http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
  
   In particular, these points deserve emphasis:
  
   The C subset of C++ is just as efficient as C.
   C++ supports cleaner code in several significant cases.
   C++ makes it easier to write cleaner interfaces by making it harder to
   break
   interface boundaries.
   C++ never requires uglier code.
 
  I think those arguments will not be very useful: they are subjective,
  and unlikely to convince people who prefer C to C++.
 
 
  They are arguments from a team which implement both a C and a C++
 compiler.
  In the spectrum of possible authorities on the matter, they rate about as
  high as I can imagine.

 There are quite a few arguments who are as authoritative and think
 those arguments are not very strong. They are as unlikely to change
 your mind as the gcc's arguments are unlikely to convince me I am
 afraid.


I imagine only points 2 and 3 are controversial for you, 1 and 4 are pretty
straightforward, yes? We could dig into the specifics of these points if
you'd like.


 
  This is a necessary part of providing a C API, and is included as a
  requirement of doing that. All C++ libraries which expose a C API deal
 with
  this.

 The only two given examples given so far for a C library around C++
 code (clang and zeromq) do not use exceptions. Can you provide an
 example of a C++ library that has a C API and does use exception ?


I couldn't find a nice example with a short search, unfortunately.


 If not, I would like to know the technical details if you don't mind
 expanding on them.


Sure. First, one would standardize on having all exceptions be derived from
std::exception. (std::bad_alloc, which we discussed before, and all other
standard exceptions, do). Then, each function exposed to the C API where
internally C++ exceptions are used would look roughly like:

int api_function(int param1, float param2, PyArrayObject *param3)
{
   try {
  ... implementation ..

  return 0;.
   } catch(std::bad_alloc) {
  PyErr_NoMemory();
  return -1;
   } catch(numpy::convergence_error e) {
  PyErr_SetString(NpyExc_ConvergenceError, e.what());
  return -1;
   } catch(std::exception e) {
  PyErr_SetString(PyExc_RuntimeError, e.what());
  return -1;
   }
}



 
 
  How can one make sure C++ extensions built
  by different compilers can work ?
 
 
  This is no different from the situation in C. Already in C on Windows,
 one
  can't build NumPy with a different version of Visual C++ than the one
 used
  to build CPython.

 This is a different situation. On windows, the mismatch between VS is
 due to the way win32 has been used by python itself - it could
 actually be fixed eventually by python (there are efforts in that
 regard). It is not a language issue.


I've already tried fixing this and building NumPy with Visual C++ 2010, and
the memory allocation/deallocation issues were pretty easy to fix. The
problem was that NumPy C code takes a FILE* object from a file opened from
within Python code. The root of the issue is when CPython uses a different
C runtime library (MSVCR##.dll) than NumPy.


 Except for that case, numpy has a pretty good record of allowing
 people to mix and match compilers. Using mingw on windows and intel
 compilers on linux are the typical cases, but not the only ones.


In these cases the compiler is adopting the name-mangling ABI of the
compiler it's matching. On Windows, the intel compiler uses the Visual C++
ABI, and on Linux, it uses the gcc ABI. But, since the CPython API is a C
API, things would still work fine even if the name-mangling were different.


 
  I would
  expect you would like to use templates for iterators in numpy - you
  can you make them available to 3rd party extensions without requiring
  C++.
 
 
  Yes, something like the nditer is a good example. From C, it would have
 to
  retain an API in the current style, but C++ users could gain an
  easier-to-use variant.

 Providing an official C++ library on top of the current C API would
 certainly be nice for people who prefer C++ to C. But this is quite
 different from using C++ at the core.


That's true.


 The current way iterators work would be very hard (if at all possible
 ?) to rewrite in idiomatic in C++ while keeping even API compatibility
 with 

Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Christopher Jordan-Squire
On Sun, Feb 19, 2012 at 2:14 AM, David Cournapeau courn...@gmail.com wrote:
 On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe mwwi...@gmail.com wrote:
 On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh ben_w_...@yahoo.co.uk wrote:



  Date: Sun, 19 Feb 2012 01:18:20 -0600
  From: Mark Wiebe mwwi...@gmail.com
  Subject: [Numpy-discussion] How a transition to C++ could work
  To: Discussion of Numerical Python NumPy-Discussion@scipy.org
  Message-ID:
 
  CAMRnEmpVTmt=kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com
  Content-Type: text/plain; charset=utf-8
 
  The suggestion of transitioning the NumPy core code from C to C++ has
  sparked a vigorous debate, and I thought I'd start a new thread to give
  my
  perspective on some of the issues raised, and describe how such a
  transition could occur.
 
  First, I'd like to reiterate the gcc rationale for their choice to
  switch:
  http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
 
  In particular, these points deserve emphasis:
 
    - The C subset of C++ is just as efficient as C.
    - C++ supports cleaner code in several significant cases.
    - C++ makes it easier to write cleaner interfaces by making it harder
  to
    break interface boundaries.
    - C++ never requires uglier code.
 

 I think they're trying to solve a different problem.

 I thought the problem that numpy was trying to solve is make inner loops
 of numerical algorithms very fast. C is great for this because you can
 write C code and picture precisely what assembly code will be generated.


 What you're describing is also the C subset of C++, so your experience
 applies just as well to C++!


 C++ removes some of this advantage -- now there is extra code generated by
 the compiler to handle constructors, destructors, operators etc which can
 make a material difference to fast inner loops. So you end up just writing
 C-style anyway.


 This is in fact not true, and writing in C++ style can often produce faster
 code. A classic example of this is C qsort vs C++ std::sort. You may be
 thinking of using virtual functions in a class hierarchy, where a tradeoff
 between performance and run-time polymorphism is being done. Emulating the
 functionality that virtual functions provide in C will give similar
 performance characteristics as the C++ language feature itself.


 On the other hand, if your problem really is write lots of OO code with
 virtual methods and have it turned into machine code (probably like the
 GCC guys) then maybe C++ is the way to go.


 Managing the complexity of the dtype subsystem, the ufunc subsystem, the
 nditer component, and other parts of NumPy could benefit from C++ Not in a
 stereotypical OO code with virtual methods way, that is not how typical
 modern C++ is done.


 Some more opinions on C++:
 http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/

 Sorry if this all seems a bit negative about C++. It's just been my
 experience that C++ adds complexity while C keeps things nice and simple.


 Yes, there are lots of negative opinions about C++ out there, it's true.
 Just like there are negative opinions about C, Java, C#, and any other
 language which has become popular. My experience with regard to complexity
 and C vs C++ is that C forces the complexity of dealing with resource
 lifetimes out into all the code everyone writes, while C++ allows one to
 encapsulate that sort of complexity into a class which is small and more
 easily verifiable. This is about code quality, and the best quality C++ code
 I've worked with has been way easier to program in than the best quality C
 code I've worked with.

 While I actually believe this to be true (very good C++ can be easier
 to read/use than very good C). Good C is also much more common than
 good C++, at least in open source.

 On the good C++ codebases you have been working on, could you rely on
 everybody being a very good C++ programmer ? Because this will most
 likely never happen for numpy. This is the crux of the argument from
 an organizational POV: the variance in C++ code quality is much more
 difficult to control. I have seen C++ code that is certainly much
 poorer and more complex than numpy, to a point where not much could be
 done to save the codebase.


Can this possibly be extended to the following: How will Mark's
(extensive) experience about performance and long-term consequences of
design decisions be communicated to future developers? We not only
want new numpy developers, we want them to write good code without
unintentional performance regressions. It seems like something more
than just code guidelines would be required.

There's also the issue that c++ compilation error messages can be
awful and disheartening. Are there ways of making them not as bad by
following certain coding styles, or is that baked in? (I know clang is
moving towards making them much better, though.)

-Chris

 cheers,

 David
 ___
 NumPy-Discussion mailing list
 NumPy

Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Mark Wiebe
On Sun, Feb 19, 2012 at 4:14 AM, David Cournapeau courn...@gmail.comwrote:

 On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe mwwi...@gmail.com wrote:
  On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh ben_w_...@yahoo.co.uk
 wrote:
 
 
 
   Date: Sun, 19 Feb 2012 01:18:20 -0600
   From: Mark Wiebe mwwi...@gmail.com
   Subject: [Numpy-discussion] How a transition to C++ could work
   To: Discussion of Numerical Python NumPy-Discussion@scipy.org
   Message-ID:
  
   CAMRnEmpVTmt=kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com
   Content-Type: text/plain; charset=utf-8
  
   The suggestion of transitioning the NumPy core code from C to C++ has
   sparked a vigorous debate, and I thought I'd start a new thread to
 give
   my
   perspective on some of the issues raised, and describe how such a
   transition could occur.
  
   First, I'd like to reiterate the gcc rationale for their choice to
   switch:
   http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
  
   In particular, these points deserve emphasis:
  
 - The C subset of C++ is just as efficient as C.
 - C++ supports cleaner code in several significant cases.
 - C++ makes it easier to write cleaner interfaces by making it
 harder
   to
 break interface boundaries.
 - C++ never requires uglier code.
  
 
  I think they're trying to solve a different problem.
 
  I thought the problem that numpy was trying to solve is make inner
 loops
  of numerical algorithms very fast. C is great for this because you can
  write C code and picture precisely what assembly code will be generated.
 
 
  What you're describing is also the C subset of C++, so your experience
  applies just as well to C++!
 
 
  C++ removes some of this advantage -- now there is extra code generated
 by
  the compiler to handle constructors, destructors, operators etc which
 can
  make a material difference to fast inner loops. So you end up just
 writing
  C-style anyway.
 
 
  This is in fact not true, and writing in C++ style can often produce
 faster
  code. A classic example of this is C qsort vs C++ std::sort. You may be
  thinking of using virtual functions in a class hierarchy, where a
 tradeoff
  between performance and run-time polymorphism is being done. Emulating
 the
  functionality that virtual functions provide in C will give similar
  performance characteristics as the C++ language feature itself.
 
 
  On the other hand, if your problem really is write lots of OO code with
  virtual methods and have it turned into machine code (probably like the
  GCC guys) then maybe C++ is the way to go.
 
 
  Managing the complexity of the dtype subsystem, the ufunc subsystem, the
  nditer component, and other parts of NumPy could benefit from C++ Not in
 a
  stereotypical OO code with virtual methods way, that is not how typical
  modern C++ is done.
 
 
  Some more opinions on C++:
  http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/
 
  Sorry if this all seems a bit negative about C++. It's just been my
  experience that C++ adds complexity while C keeps things nice and
 simple.
 
 
  Yes, there are lots of negative opinions about C++ out there, it's true.
  Just like there are negative opinions about C, Java, C#, and any other
  language which has become popular. My experience with regard to
 complexity
  and C vs C++ is that C forces the complexity of dealing with resource
  lifetimes out into all the code everyone writes, while C++ allows one to
  encapsulate that sort of complexity into a class which is small and more
  easily verifiable. This is about code quality, and the best quality C++
 code
  I've worked with has been way easier to program in than the best quality
 C
  code I've worked with.

 While I actually believe this to be true (very good C++ can be easier
 to read/use than very good C). Good C is also much more common than
 good C++, at least in open source.

 On the good C++ codebases you have been working on, could you rely on
 everybody being a very good C++ programmer?


Not initially, but I designed the coding standards and taught the
programmers I hired how to write good C++ code.


 Because this will most
 likely never happen for numpy.


This is the role I see good coding standards and consistent code review
playing. Programmers who don't know how to write good C++ code can be
taught. There are also good books to read, like C++ Coding Standards,
Effective C++, and others that can help people learn proper technique.


 This is the crux of the argument from
 an organizational POV: the variance in C++ code quality is much more
 difficult to control. I have seen C++ code that is certainly much
 poorer and more complex than numpy, to a point where not much could be
 done to save the codebase.


That's a consequence of the power C++ provides. It assumes the programmer
knows what he or she is doing, and provides the tools to make things great
or shoot oneself in the foot. I'd like to use that power to make NumPy
better, in a way which uses high

Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Mark Wiebe
On Sun, Feb 19, 2012 at 4:30 AM, Christopher Jordan-Squire
cjord...@uw.eduwrote:

 On Sun, Feb 19, 2012 at 2:14 AM, David Cournapeau courn...@gmail.com
 wrote:
  On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe mwwi...@gmail.com wrote:
  On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh ben_w_...@yahoo.co.uk
 wrote:
 
 
 
   Date: Sun, 19 Feb 2012 01:18:20 -0600
   From: Mark Wiebe mwwi...@gmail.com
   Subject: [Numpy-discussion] How a transition to C++ could work
   To: Discussion of Numerical Python NumPy-Discussion@scipy.org
   Message-ID:
  
   CAMRnEmpVTmt=kdurpzktgui516oqtqd4vazm746hmpqgpfx...@mail.gmail.com
   Content-Type: text/plain; charset=utf-8
  
   The suggestion of transitioning the NumPy core code from C to C++ has
   sparked a vigorous debate, and I thought I'd start a new thread to
 give
   my
   perspective on some of the issues raised, and describe how such a
   transition could occur.
  
   First, I'd like to reiterate the gcc rationale for their choice to
   switch:
   http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale
  
   In particular, these points deserve emphasis:
  
 - The C subset of C++ is just as efficient as C.
 - C++ supports cleaner code in several significant cases.
 - C++ makes it easier to write cleaner interfaces by making it
 harder
   to
 break interface boundaries.
 - C++ never requires uglier code.
  
 
  I think they're trying to solve a different problem.
 
  I thought the problem that numpy was trying to solve is make inner
 loops
  of numerical algorithms very fast. C is great for this because you can
  write C code and picture precisely what assembly code will be
 generated.
 
 
  What you're describing is also the C subset of C++, so your experience
  applies just as well to C++!
 
 
  C++ removes some of this advantage -- now there is extra code
 generated by
  the compiler to handle constructors, destructors, operators etc which
 can
  make a material difference to fast inner loops. So you end up just
 writing
  C-style anyway.
 
 
  This is in fact not true, and writing in C++ style can often produce
 faster
  code. A classic example of this is C qsort vs C++ std::sort. You may be
  thinking of using virtual functions in a class hierarchy, where a
 tradeoff
  between performance and run-time polymorphism is being done. Emulating
 the
  functionality that virtual functions provide in C will give similar
  performance characteristics as the C++ language feature itself.
 
 
  On the other hand, if your problem really is write lots of OO code
 with
  virtual methods and have it turned into machine code (probably like
 the
  GCC guys) then maybe C++ is the way to go.
 
 
  Managing the complexity of the dtype subsystem, the ufunc subsystem, the
  nditer component, and other parts of NumPy could benefit from C++ Not
 in a
  stereotypical OO code with virtual methods way, that is not how
 typical
  modern C++ is done.
 
 
  Some more opinions on C++:
  http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/
 
  Sorry if this all seems a bit negative about C++. It's just been my
  experience that C++ adds complexity while C keeps things nice and
 simple.
 
 
  Yes, there are lots of negative opinions about C++ out there, it's true.
  Just like there are negative opinions about C, Java, C#, and any other
  language which has become popular. My experience with regard to
 complexity
  and C vs C++ is that C forces the complexity of dealing with resource
  lifetimes out into all the code everyone writes, while C++ allows one to
  encapsulate that sort of complexity into a class which is small and more
  easily verifiable. This is about code quality, and the best quality C++
 code
  I've worked with has been way easier to program in than the best
 quality C
  code I've worked with.
 
  While I actually believe this to be true (very good C++ can be easier
  to read/use than very good C). Good C is also much more common than
  good C++, at least in open source.
 
  On the good C++ codebases you have been working on, could you rely on
  everybody being a very good C++ programmer ? Because this will most
  likely never happen for numpy. This is the crux of the argument from
  an organizational POV: the variance in C++ code quality is much more
  difficult to control. I have seen C++ code that is certainly much
  poorer and more complex than numpy, to a point where not much could be
  done to save the codebase.
 

 Can this possibly be extended to the following: How will Mark's
 (extensive) experience about performance and long-term consequences of
 design decisions be communicated to future developers? We not only
 want new numpy developers, we want them to write good code without
 unintentional performance regressions. It seems like something more
 than just code guidelines would be required.


I've tried to set a bit of an example to start with the NEPs I've written.
The NEPs for both the nditer and the NA functionality are very long and
detailed. Some

Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Sturla Molden


Den 19. feb. 2012 kl. 09:51 skrev Stéfan van der Walt ste...@sun.ac.za:

 
 OK, so let's talk specifics: how do you dynamically grab a function pointer 
 to a compiled C++ library, a la ctypes? Feel free to point me to 
 StackOverflow or elsewhere.
 

You declare the function with the signature extern C. 


Sturla___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Adam Klein
On Feb 19, 2012, at 2:18 AM, Mark Wiebe mwwi...@gmail.com wrote:

The suggestion of transitioning the NumPy core code from C to C++ has
sparked a vigorous debate, and I thought I'd start a new thread to give my
perspective on some of the issues raised, and describe how such a
transition could occur.

First, I'd like to reiterate the gcc rationale for their choice to switch:
http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale

In particular, these points deserve emphasis:

   - The C subset of C++ is just as efficient as C.
   - C++ supports cleaner code in several significant cases.
   - C++ makes it easier to write cleaner interfaces by making it harder to
   break interface boundaries.
   - C++ never requires uglier code.

Some people have pointed out that the Python templating preprocessor used
in NumPy is suggestive of C++ templates. A nice advantage of using C++
templates instead of this preprocessor is that third party tools to improve
software quality, like static analysis tools, will be able to run directly
on the NumPy source code. Additionally, IDEs like XCode and Visual C++ will
be able to provide the full suite of tab-completion/intellisense features
that programmers working in those environments are accustomed to.

There are concerns about ABI/API interoperability and interactions with C++
exceptions. I've dealt with these types of issues on enough platforms to
know that while they're important, they're a lot easier to handle than the
issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that
providing a C API from a C++ library is no harder than providing a C API
from a C library.

It's worth comparing the possibility of C++ versus the possibility of other
languages, and the ones that have been suggested for consideration are D,
Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language
has to interact naturally with the CPython API. It needs to provide direct
access to all the various sizes of signed int, unsigned int, and float. It
needs to have mature compiler support wherever we want to deploy NumPy.
Taken together, these requirements eliminate a majority of these
possibilities. From these criteria, the only languages which seem to have a
clear possibility for the implementation of Numpy are C, C++, and D. For D,
I suspect the tooling is not mature enough, but I'm not 100% certain of
that.


I am a huge fan of D, but you are dead on about its tooling, so +1 on the
observation. Its code generation especially with respect to floating point
is also a known area needing improvement IIRC.

The biggest question for any of these possibilities is how do you get the
code from its current state to a state which fully utilizes the target
language. C++, being nearly a superset of C, offers a strategy to gradually
absorb C++ features. Any of the other language choices requires a rewrite,
which would be quite disruptive. Because of all these reasons taken
together, I believe the only realistic language to use, other than sticking
with C, is C++.

Finally, here's what I think is the best strategy for transitioning to C++.
First, let's consider what we do if 1.7 becomes an LTS release.

1) Immediately after branching for 1.7, we minimally patch all the .c files
so that they can build with a C++ compiler and with a C compiler at the
same time. Then we rename all .c - .cpp, and update the build systems for
C++.
2) During the 1.8 development cycle, we heavily restrict C++ feature usage.
But, where a feature implementation would be arguably easier and less
error-prone with C++, we allow it. This is a period for learning about C++
and how it can benefit NumPy.
3) After the 1.8 release, the community will have developed more experience
with C++, and will be in a better position to discuss a way forward.

If, for some reason, a 1.7 LTS is unacceptable, it might be a good idea to
restrict the 1.8 release to the subset of both C and C++. I would much
prefer using the 1.8 development cycle to dip our toes into the C++ world
to get some of the low-hanging benefits without doing anything disruptive.

A really important point to emphasize is that C++ allows for a strategy
where we gradually evolve the codebase to better incorporate its language
features. This is what I'm advocating. No massive rewrite, no disruptive
changes. Gradual code evolution, with ABI and API compatibility comparable
to what we've delivered in 1.6 and the upcoming 1.7 releases.

Thanks,
Mark

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Sturla Molden
Den 19.02.2012 11:30, skrev Christopher Jordan-Squire:

 Can this possibly be extended to the following: How will Mark's
 (extensive) experience about performance and long-term consequences of
 design decisions be communicated to future developers? We not only
 want new numpy developers, we want them to write good code without
 unintentional performance regressions. It seems like something more
 than just code guidelines would be required.

There are more examples of crappy than good C++ out there. There
are tons of litterature on how to write crappy C++. And most
programmers do not have the skill or knowledge to write good C++.

My biggest issue with C++ is the variability of skills among
programmers. It will result in code that are:

- unncessesary complex
- ugly looking
- difficult to understand
- verbose and long
- inefficient
- full of subtile errors
- impossible to debug
- impossible to maintain
- not scalable with hardware
- dependent on one particular compiler

It is easier to achive this with C++ than C. But it is also
easier to avoid. Double-edged sword.

It will take more than guidelines.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Sturla Molden

Den 19.02.2012 10:52, skrev Mark Wiebe:


C++ removes some of this advantage -- now there is extra code
generated by
the compiler to handle constructors, destructors, operators etc
which can
make a material difference to fast inner loops. So you end up just
writing
C-style anyway.


This is in fact not true, and writing in C++ style can often produce 
faster code. A classic example of this is C qsort vs C++ std::sort. 
You may be thinking of using virtual functions in a class hierarchy, 
where a tradeoff between performance and run-time polymorphism is 
being done. Emulating the functionality that virtual functions provide 
in C will give similar performance characteristics as the C++ language 
feature itself.


I agree with Mark here. C++ usually produces the faster code. C++ has 
abstractions that makes it easier to write more efficient code. C++ 
provides more and better information to the compiler (e.g. strict 
aliasing rules). C++ compilers are also getting insanely good at 
optimisation, usually better than C compilers. But C++ also makes it 
easy to write sluggish bloatware, so the effect on performance is not 
predictable.


Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Adam Klein
On Feb 19, 2012, at 10:38 AM, Sturla Molden stu...@molden.no wrote:

 Den 19.02.2012 10:52, skrev Mark Wiebe:


 C++ removes some of this advantage -- now there is extra code generated by
 the compiler to handle constructors, destructors, operators etc which can
 make a material difference to fast inner loops. So you end up just writing
 C-style anyway.


 This is in fact not true, and writing in C++ style can often produce
faster code. A classic example of this is C qsort vs C++ std::sort. You may
be thinking of using virtual functions in a class hierarchy, where a
tradeoff between performance and run-time polymorphism is being done.
Emulating the functionality that virtual functions provide in C will give
similar performance characteristics as the C++ language feature itself.


I agree with Mark here. C++ usually produces the faster code. C++ has
abstractions that makes it easier to write more efficient code. C++
provides more and better information to the compiler (e.g. strict aliasing
rules). C++ compilers are also getting insanely good at optimisation,
usually better than C compilers. But C++ also makes it easy to write
sluggish bloatware, so the effect on performance is not predictable.


Just to add, with respect to acceptable compilation times, a judicious
choice of C++ features is critical.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Sturla Molden
Den 19.02.2012 16:45, skrev Adam Klein:

 Just to add, with respect to acceptable compilation times, a judicious 
 choice of C++ features is critical.


I use Python to avoid recompiling my code all the time. I don't 
recompile NumPy every time I use it.

(I know you are thinking about development, but you have the wrong 
perspective.)


Sturla






___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Ralf Gommers
On Sun, Feb 19, 2012 at 4:53 PM, Sturla Molden stu...@molden.no wrote:

 Den 19.02.2012 16:45, skrev Adam Klein:
 
  Just to add, with respect to acceptable compilation times, a judicious
  choice of C++ features is critical.
 

 I use Python to avoid recompiling my code all the time. I don't
 recompile NumPy every time I use it.

 (I know you are thinking about development, but you have the wrong
 perspective.)


No he doesn't. Perspectives aren't wrong, just different.

I compile both numpy and scipy on a regular (almost daily) basis, and long
compile times are very annoying.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How a transition to C++ could work

2012-02-19 Thread Stéfan van der Walt
On Feb 19, 2012 2:41 AM, Mark Wiebe mwwi...@gmail.com wrote:

 This is the role I see good coding standards and consistent code review
playing. Programmers who don't know how to write good C++ code can be
taught. There are also good books to read, like C++ Coding Standards,
Effective C++, and others that can help people learn proper technique.

I recommended this book (one in the list avove) to anyone who is not afraid
of C++ yet:

http://search.barnesandnoble.com/Effective-C/Scott-Meyers/e/9780321334879

With great power comes great responsibility.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] How a transition to C++ could work

2012-02-18 Thread Mark Wiebe
The suggestion of transitioning the NumPy core code from C to C++ has
sparked a vigorous debate, and I thought I'd start a new thread to give my
perspective on some of the issues raised, and describe how such a
transition could occur.

First, I'd like to reiterate the gcc rationale for their choice to switch:
http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale

In particular, these points deserve emphasis:

   - The C subset of C++ is just as efficient as C.
   - C++ supports cleaner code in several significant cases.
   - C++ makes it easier to write cleaner interfaces by making it harder to
   break interface boundaries.
   - C++ never requires uglier code.

Some people have pointed out that the Python templating preprocessor used
in NumPy is suggestive of C++ templates. A nice advantage of using C++
templates instead of this preprocessor is that third party tools to improve
software quality, like static analysis tools, will be able to run directly
on the NumPy source code. Additionally, IDEs like XCode and Visual C++ will
be able to provide the full suite of tab-completion/intellisense features
that programmers working in those environments are accustomed to.

There are concerns about ABI/API interoperability and interactions with C++
exceptions. I've dealt with these types of issues on enough platforms to
know that while they're important, they're a lot easier to handle than the
issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that
providing a C API from a C++ library is no harder than providing a C API
from a C library.

It's worth comparing the possibility of C++ versus the possibility of other
languages, and the ones that have been suggested for consideration are D,
Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language
has to interact naturally with the CPython API. It needs to provide direct
access to all the various sizes of signed int, unsigned int, and float. It
needs to have mature compiler support wherever we want to deploy NumPy.
Taken together, these requirements eliminate a majority of these
possibilities. From these criteria, the only languages which seem to have a
clear possibility for the implementation of Numpy are C, C++, and D. For D,
I suspect the tooling is not mature enough, but I'm not 100% certain of
that.

The biggest question for any of these possibilities is how do you get the
code from its current state to a state which fully utilizes the target
language. C++, being nearly a superset of C, offers a strategy to gradually
absorb C++ features. Any of the other language choices requires a rewrite,
which would be quite disruptive. Because of all these reasons taken
together, I believe the only realistic language to use, other than sticking
with C, is C++.

Finally, here's what I think is the best strategy for transitioning to C++.
First, let's consider what we do if 1.7 becomes an LTS release.

1) Immediately after branching for 1.7, we minimally patch all the .c files
so that they can build with a C++ compiler and with a C compiler at the
same time. Then we rename all .c - .cpp, and update the build systems for
C++.
2) During the 1.8 development cycle, we heavily restrict C++ feature usage.
But, where a feature implementation would be arguably easier and less
error-prone with C++, we allow it. This is a period for learning about C++
and how it can benefit NumPy.
3) After the 1.8 release, the community will have developed more experience
with C++, and will be in a better position to discuss a way forward.

If, for some reason, a 1.7 LTS is unacceptable, it might be a good idea to
restrict the 1.8 release to the subset of both C and C++. I would much
prefer using the 1.8 development cycle to dip our toes into the C++ world
to get some of the low-hanging benefits without doing anything disruptive.

A really important point to emphasize is that C++ allows for a strategy
where we gradually evolve the codebase to better incorporate its language
features. This is what I'm advocating. No massive rewrite, no disruptive
changes. Gradual code evolution, with ABI and API compatibility comparable
to what we've delivered in 1.6 and the upcoming 1.7 releases.

Thanks,
Mark
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion