[Numpy-discussion] Requesting Code Review of nanmedian ENH
Hi everyone, I put together a np.nanmedian function to extend np.median to handle nans. Could someone review this code and give me some feedback on it before I submit a pull request for it? https://github.com/dfreese/numpy/compare/master...feature;nanmedian Thanks, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Requesting Code Review of nanmedian ENH
hi david, I havnt run the code; but the _replace_nan(0) call worries me; especially considering that the unit tests seem to deal with positive numbers exclusively. Have you tested with mixed positive/negative inputs? On Sun, Feb 16, 2014 at 6:13 PM, David Freese dfre...@stanford.edu wrote: Hi everyone, I put together a np.nanmedian function to extend np.median to handle nans. Could someone review this code and give me some feedback on it before I submit a pull request for it? https://github.com/dfreese/numpy/compare/master...feature;nanmedian Thanks, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Requesting Code Review of nanmedian ENH
On Sun, Feb 16, 2014 at 12:13 PM, David Freese dfre...@stanford.edu wrote: Hi everyone, I put together a np.nanmedian function to extend np.median to handle nans. Could someone review this code and give me some feedback on it before I submit a pull request for it? It looks good to submit as a pull request but probably will need some changes like the mixed sign thing already mentioned, and I see mean vs. median copypaste remnants in the docstring. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Requesting Code Review of nanmedian ENH
the 0s put into the array copy arr are not used in computation. The _replace_nan call is used primarily to generate a mask of the NaNs and make sure it passes the mutation test. I updated the unit tests to reflect negative values, which works. (and the documentation should be cleaned up now) https://github.com/dfreese/numpy/compare/master...feature;nanmedian On Sun, Feb 16, 2014 at 9:52 AM, alex argri...@ncsu.edu wrote: On Sun, Feb 16, 2014 at 12:13 PM, David Freese dfre...@stanford.edu wrote: Hi everyone, I put together a np.nanmedian function to extend np.median to handle nans. Could someone review this code and give me some feedback on it before I submit a pull request for it? It looks good to submit as a pull request but probably will need some changes like the mixed sign thing already mentioned, and I see mean vs. median copypaste remnants in the docstring. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Requesting Code Review of nanmedian ENH
It doesn't deal with numpy.matrix in the same way as numpy.nanmean. never mind about this -- it looks like np.median is currently broken for np.matrix. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Requesting Code Review of nanmedian ENH
On Sun, Feb 16, 2014 at 1:01 PM, David Freese dfre...@stanford.edu wrote: the 0s put into the array copy arr are not used in computation. The _replace_nan call is used primarily to generate a mask of the NaNs and make sure it passes the mutation test. I updated the unit tests to reflect negative values, which works. (and the documentation should be cleaned up now) https://github.com/dfreese/numpy/compare/master...feature;nanmedian It doesn't deal with numpy.matrix in the same way as numpy.nanmean. For example import numpy as np m = np.matrix([[np.nan, np.nan, 1]]) np.isscalar(np.nanmean(m)) True np.isscalar(np.nanmedian(m)) False Some of the nanfunctions.py code has comments regarding carefulness in dealing with subclasses of numpy.ndarray (like numpy.matrix), and some of the nanfunctions tests include tests for this kind of behavior. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestions for GSoC Projects
On Wed, Feb 12, 2014 at 2:11 AM, Pauli Virtanen p...@iki.fi wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi It's not so often someone wants to work on scipy.special, so you'd be welcome to improve it :) That's great! Thanks a lot for your guidance. . - Spherical harmonics might be a reasonable part of a GSoC proposal. However, note that there exists also a *second* Legendre polynomial function `lpmv`, which doesn't store the values of the previous N functions. There's one numerical problem in the current way of evaluation via ~Pmn(cos(theta)), which is that this approach seems to lose relatively much precision at large orders for certain values of theta. I don't recall now exactly how imprecise it becomes at large orders, but it may be necessary to check. I checked lpmv and lpmn. As you rightly said, lpmv avoids the storage of values small N's by using recursion. Why can't we first check if m and n are n positive integers and pass them to lpmv itself rather than lpmn? lpmn does give us the derivatives too, but sph_harm has no need for that, and of course all the values for m and n too are trimmed. What is the benefit of lpmn over lpmv at present? Adding new special functions also sounds like an useful project. Here, it helps if they are something that you expect you will need later on :) I am unable to think beyond ellipsoidal functions. As for their use, we as students used them in problems of thermal equilibrium in ellipsoidal bodies, and some scattering cases. Though I have no idea if its use in general is quite prominent. And cylindrical harmonic function would be useful but I feel it's quite straight forward to implement (correct me if I am wrong) . There's also the case that several of the functions in Scipy have only implementations for real-valued inputs, although the functions would be defined on the whole complex plane. A list of the situation is here: https://github.com/scipy/scipy/blob/master/scipy/special /generate_ufuncs.py#L85 Lowercase d correspond to real-valued implementations, uppercase D to complex-valued. I'm not at the moment completely sure which would have the highest priority --- whether you need this or not really depends on the application. If you want additional ideas about possible things to fix in scipy.special, take a look at this file: https://github.com/scipy/scipy/blob/master/scipy/special/tests /test_mpmath.py#L648 The entries marked @knownfailure* have some undiagnosed issues in the implementation, which might be useful to look into. However: most of these have to do with corner cases in hypergeometric functions. Trying to address those is likely a risky GSoC topic, as the multi-argument hyp* functions are challenging to evaluate in floating point. (mpmath and Mathematica can evaluate them in most parameter regimes, but AFAIK both require arbitrary-precision methods for this.) Yeah, many of the known failures seem to revolve around hyp2f1. An unexplained inclination towards hypergeometric functions really tempts me to plunge into this. If it's too risky, I can work on this after the summers, as I would have gained quite a lot of experience with the code here. So I think there would be a large number of possible things to do here, and help would be appreciated. - -- Pauli Virtanen -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) iEYEARECAAYFAlL6iwAACgkQ6BQxb7O0pWBfOgCfYHAB12N4FWDmrqx8/ORTBRps pXYAoL3ufAiShe+0qTEGfEvrmDgr1X0p =kAwF -END PGP SIGNATURE- Regard Jennifer ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Scipy 0.13.3 release
Binaries are now available on SourceForge. Ralf On Tue, Feb 4, 2014 at 8:16 AM, Ralf Gommers ralf.gomm...@gmail.com wrote: Hi, I'm happy to announce the availability of the scipy 0.13.3 release. This is a bugfix only release; it contains fixes for regressions in ndimage and weave. Source tarballs can be found at https://sourceforge.net/projects/scipy/files/scipy/0.13.3/ and on PyPi. Release notes copied below, binaries will follow later (the regular build machine is not available for the next two weeks). Cheers, Ralf == SciPy 0.13.3 Release Notes == SciPy 0.13.3 is a bug-fix release with no new features compared to 0.13.2. Both the weave and the ndimage.label bugs were severe regressions in 0.13.0, hence this release. Issues fixed - 3148: fix a memory leak in ``ndimage.label``. - 3216: fix weave issue with too long file names for MSVC. Other changes - - Update Sphinx theme used for html docs so in examples can be toggled. Checksums = 0547c1f8e8afad4009cc9b5ef17a2d4d release/installers/scipy-0.13.3.tar.gz 20ff3a867cc5925ef1d654aed2ff7e88 release/installers/scipy-0.13.3.zip ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] argsort speed
currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
My guess; First of all, you are actually manipulating twice as much data as opposed to an inplace sort. Moreover, an inplace sort gains locality as it is being sorted, whereas the argsort is continuously making completely random memory accesses. -Original Message- From: josef.p...@gmail.com Sent: Sunday, February 16, 2014 11:43 PM To: Discussion of Numerical Python Subject: [Numpy-discussion] argsort speed currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop It looks quicksort is quicker sorting, but mergesort is marginally faster sorting args. The diference is slim, but upon repetition, it remains significant. Why is that? Probably part of the reason is what Eelco said, and part is that in sort comparison are done accessing the array elements directly, but in argsort you have to index the array, introducing some overhead. I seem unable to find the code for ndarray.sort, so I can't check. I have tried to grep it tring all possible combinations of def ndarray, self.sort, etc. Where is it? /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On 17 February 2014 00:12, Daπid davidmen...@gmail.com wrote: I seem unable to find the code for ndarray.sort, so I can't check. I have tried to grep it tring all possible combinations of def ndarray, self.sort, etc. Where is it? Nevermind, it is in core/src/multiarray/methods.c ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: My guess; First of all, you are actually manipulating twice as much data as opposed to an inplace sort. Moreover, an inplace sort gains locality as it is being sorted, whereas the argsort is continuously making completely random memory accesses. -Original Message- From: josef.p...@gmail.com Sent: Sunday, February 16, 2014 11:43 PM To: Discussion of Numerical Python Subject: [Numpy-discussion] argsort speed currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. I was using np.sort, inplace sort is only a little bit faster It looks like sorting first, and then argsorting is faster than argsort alon pvals.sort() sortind = np.argsort(pvals) replacing the inplace sort in the above reduces speed only a bit -- import time use_master = True if use_master: import sys sys.path.insert(0, rE:\Josef\!reps\numpy\dist\Programs\Python27\Lib\site-packages) import numpy as np print np.__version__ =, np.__version__ n = 5300 pvals = np.random.rand(n**2) t0 = time.time() p = np.sort(pvals) t1 = time.time() sortind = np.argsort(pvals) t2 = time.time() pvals.sort() sortind = np.argsort(pvals) t3 = time.time() print t1 - t0, t2 - t1, t3 - t2 print (t2 - t1) * 1. / (t1 - t0), (t3 - t2) * 1. / (t1 - t0) np.__version__ = 1.9.0.dev-2868dc4 3.91900014877 9.556218 4.92900013924 2.43863219163 1.2577187936 Josef Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Sun, Feb 16, 2014 at 6:15 PM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: My guess; First of all, you are actually manipulating twice as much data as opposed to an inplace sort. Moreover, an inplace sort gains locality as it is being sorted, whereas the argsort is continuously making completely random memory accesses. -Original Message- From: josef.p...@gmail.com Sent: Sunday, February 16, 2014 11:43 PM To: Discussion of Numerical Python Subject: [Numpy-discussion] argsort speed currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. I was using np.sort, inplace sort is only a little bit faster It looks like sorting first, and then argsorting is faster than argsort alon pvals.sort() sortind = np.argsort(pvals) Ok, that was useless, that won't be anything I want. Josef replacing the inplace sort in the above reduces speed only a bit -- import time use_master = True if use_master: import sys sys.path.insert(0, rE:\Josef\!reps\numpy\dist\Programs\Python27\Lib\site-packages) import numpy as np print np.__version__ =, np.__version__ n = 5300 pvals = np.random.rand(n**2) t0 = time.time() p = np.sort(pvals) t1 = time.time() sortind = np.argsort(pvals) t2 = time.time() pvals.sort() sortind = np.argsort(pvals) t3 = time.time() print t1 - t0, t2 - t1, t3 - t2 print (t2 - t1) * 1. / (t1 - t0), (t3 - t2) * 1. / (t1 - t0) np.__version__ = 1.9.0.dev-2868dc4 3.91900014877 9.556218 4.92900013924 2.43863219163 1.2577187936 Josef Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestions for GSoC Projects
16.02.2014 23:34, Jennifer stone kirjoitti: [clip] Yeah, many of the known failures seem to revolve around hyp2f1. An unexplained inclination towards hypergeometric functions really tempts me to plunge into this. If it's too risky, I can work on this after the summers, as I would have gained quite a lot of experience with the code here. If you are interested in the hypergeometric numerical evaluation, it's probably a good idea to take a look at this recent master's thesis written on the problem: http://people.maths.ox.ac.uk/porterm/research/pearson_final.pdf This may give some systematic overview on the range of methods available. (Note that for copyright reasons, it's not a good idea to look closely at the source codes linked from that thesis, as they are not available under a compatible license.) It may well be that the best approach for evaluating these functions, if accuracy in the whole parameter range is wanted, in the end turns out to require arbitrary-precision computations. In that case, it would be a very good idea to look at how the problem is approached in mpmath. There are existing multiprecision packages written in C, and using one of them in scipy.special could bring better evaluation performance even if the algorithm is the same. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote: On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop It looks quicksort is quicker sorting, but mergesort is marginally faster sorting args. The diference is slim, but upon repetition, it remains significant. Why is that? Probably part of the reason is what Eelco said, and part is that in sort comparison are done accessing the array elements directly, but in argsort you have to index the array, introducing some overhead. Thanks, both. I also gain a second with mergesort. matlab would be nicer in my case, it returns both. I still need to use the argsort to index into the array to also get the sorted array. Josef I seem unable to find the code for ndarray.sort, so I can't check. I have tried to grep it tring all possible combinations of def ndarray, self.sort, etc. Where is it? /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Sun, Feb 16, 2014 at 4:18 PM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 6:15 PM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: My guess; First of all, you are actually manipulating twice as much data as opposed to an inplace sort. Moreover, an inplace sort gains locality as it is being sorted, whereas the argsort is continuously making completely random memory accesses. -Original Message- From: josef.p...@gmail.com Sent: Sunday, February 16, 2014 11:43 PM To: Discussion of Numerical Python Subject: [Numpy-discussion] argsort speed currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. I was using np.sort, inplace sort is only a little bit faster It looks like sorting first, and then argsorting is faster than argsort alon pvals.sort() sortind = np.argsort(pvals) Ok, that was useless, that won't be anything I want. I think locality is the most important thing. The argsort routines used to move both the indexes and the array to argsort, the new ones only move the indexes. It is a tradeoff, twice as many moves vs locality. It's probably possible to invent an algorithm that mixes the two. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Sun, Feb 16, 2014 at 7:13 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Sun, Feb 16, 2014 at 4:18 PM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 6:15 PM, josef.p...@gmail.com wrote: On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn hoogendoorn.ee...@gmail.com wrote: My guess; First of all, you are actually manipulating twice as much data as opposed to an inplace sort. Moreover, an inplace sort gains locality as it is being sorted, whereas the argsort is continuously making completely random memory accesses. -Original Message- From: josef.p...@gmail.com Sent: Sunday, February 16, 2014 11:43 PM To: Discussion of Numerical Python Subject: [Numpy-discussion] argsort speed currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. I was using np.sort, inplace sort is only a little bit faster It looks like sorting first, and then argsorting is faster than argsort alon pvals.sort() sortind = np.argsort(pvals) Ok, that was useless, that won't be anything I want. I think locality is the most important thing. The argsort routines used to move both the indexes and the array to argsort, the new ones only move the indexes. It is a tradeoff, twice as many moves vs locality. It's probably possible to invent an algorithm that mixes the two. If that's the way it is, then that's the way it is. I just never realized that argsort can take so long, since I usually use only smaller arrays. I was surprised that argsort took almost 10 out of around 12 seconds in my function, and this after I cleaned up my code and removed duplicate and avoidable argsorts. Josef snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] argsort speed
On Sun, Feb 16, 2014 at 4:12 PM, Daπid davidmen...@gmail.com wrote: On 16 February 2014 23:43, josef.p...@gmail.com wrote: What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop Interesting. I get slightly different results quicksort 1 loops, best of 3: 3.25 s per loop 1 loops, best of 3: 6.16 s per loop mergesort 1 loops, best of 3: 3.99 s per loop 1 loops, best of 3: 6.97 s per loop heapsort 1 loops, best of 3: 10.1 s per loop 1 loops, best of 3: 29.3 s per loop Possibly faster memory here. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion