Hi numpy developers, First of all, thanks a lot for the hard work you put in numpy. I know very well that maintaining such a core library is a lot of effort and a service to the community. But "with great dedication, comes great responsibility" :).
I find that Numpy is a bit of a wild horse, a moving target. I have just fixed a fairly nasty bug in scikit-learn [1] that was introduced by change of semantics in ordering when doing copies with numpy. I have been running working and developing the scikit-learn while tracking numpy's development tree and, as far as I can tell, I never saw warnings raised in our code that something was going to change, or had changed. In other settings, changes in array inheritance and 'base' propagation have made impossible some of our memmap-related usecase that used to work under previous numpy [2]. Other's have been hitting difficulties related to these changes in behavior [3]. Not to mention the new casting rules (default: 'same_kind') that break a lot of code, or the ABI change that, while not done an purpose, ended up causing us a lot of pain. My point here is that having code that works and gives correct results with new releases of numpy is more challenging that it should be. I cannot claim that I disagree with the changes that I mention above. They were all implemented for a good reason and can all be considered as overall improvements to numpy. However the situation is that given a complex codebase relying on numpy that works at a time t, the chances that it works flawlessly at time t + 1y are thin. I am not too proud that we managed to release scikit-learn 0.12 with a very ugly bug under numpy 1.7. That happened although we have 90% of test coverage, buildbots under different numpy versions, and a lot of people, including me, using our development tree on a day to day basis with bleeding edge numpy. Most code in research settings or RD industry does not benefit from such software engineering and I believe is much more likely to suffer from changes in numpy. I think that this is a cultural issue: priority is not given to stability and backward compatibility. I think that this culture is very much ingrained in the Python world, that likes iteratively cleaning its software design. For instance, I have the feeling that in the scikit-learn, we probably fall in the same trap. That said, such a behavior cannot fare well for a base scientific environment. People tell me that if they take old matlab code, the odds that it will still works is much higher than with Python code. As a geek, I tend to reply that we get a lot out of this mobility, because we accumulate less cruft. However, in research settings, for reproducibility reasons, ones need to be able to pick up an old codebase and trust its results without knowing its intricacies. >From a practical standpoint, I believe that people implementing large changes to the numpy codebase, or any other core scipy package, should think really hard about their impact. I do realise that the changes are discussed on the mailing lists, but there is a lot of activity to follow and I don't believe that it is possible for many of us to monitor the discussions. Also, putting more emphasis on backward compatibility is possible. For instance, the 'order' parameter added to np.copy could have defaulted to the old behavior, 'K', for a year, with a DeprecationWarning, same thing for the casting rules. Thank you for reading this long email. I don't mean it to be a complaint about the past, but more a suggestion on something to keep in mind when making changes to core projects. Cheers, Gaël ____ [1] https://github.com/scikit-learn/scikit-learn/commit/7842748cf777412c506a8c0ed28090711d3a3783 [2] http://mail.scipy.org/pipermail/numpy-discussion/2012-September/063985.html [3] http://mail.scipy.org/pipermail/numpy-discussion/2012-July/063126.html _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
