[Numpy-discussion] Requesting Code Review of nanmedian ENH

2014-02-16 Thread David Freese
Hi everyone,

I put together a np.nanmedian function to extend np.median to handle nans.
 Could someone review this code and give me some feedback on it before I
submit a pull request for it?

https://github.com/dfreese/numpy/compare/master...feature;nanmedian

Thanks,
David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Requesting Code Review of nanmedian ENH

2014-02-16 Thread Eelco Hoogendoorn
hi david,

I havnt run the code; but the _replace_nan(0) call worries me; especially
considering that the unit tests seem to deal with positive numbers
exclusively. Have you tested with mixed positive/negative inputs?



On Sun, Feb 16, 2014 at 6:13 PM, David Freese dfre...@stanford.edu wrote:

 Hi everyone,

 I put together a np.nanmedian function to extend np.median to handle nans.
  Could someone review this code and give me some feedback on it before I
 submit a pull request for it?

 https://github.com/dfreese/numpy/compare/master...feature;nanmedian

 Thanks,
 David

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Requesting Code Review of nanmedian ENH

2014-02-16 Thread alex
On Sun, Feb 16, 2014 at 12:13 PM, David Freese dfre...@stanford.edu wrote:
 Hi everyone,

 I put together a np.nanmedian function to extend np.median to handle nans.
 Could someone review this code and give me some feedback on it before I
 submit a pull request for it?

It looks good to submit as a pull request but probably will need some
changes like the mixed sign thing already mentioned, and I see mean
vs. median copypaste remnants in the docstring.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Requesting Code Review of nanmedian ENH

2014-02-16 Thread David Freese
the 0s put into the array copy arr are not used in computation.  The
_replace_nan call is used primarily to generate a mask of the NaNs and make
sure it passes the mutation test.  I updated the unit tests to reflect
negative values, which works.  (and the documentation should be cleaned up
now)

https://github.com/dfreese/numpy/compare/master...feature;nanmedian



On Sun, Feb 16, 2014 at 9:52 AM, alex argri...@ncsu.edu wrote:

 On Sun, Feb 16, 2014 at 12:13 PM, David Freese dfre...@stanford.edu
 wrote:
  Hi everyone,
 
  I put together a np.nanmedian function to extend np.median to handle
 nans.
  Could someone review this code and give me some feedback on it before I
  submit a pull request for it?

 It looks good to submit as a pull request but probably will need some
 changes like the mixed sign thing already mentioned, and I see mean
 vs. median copypaste remnants in the docstring.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Requesting Code Review of nanmedian ENH

2014-02-16 Thread alex
 It doesn't deal with numpy.matrix in the same way as numpy.nanmean.

never mind about this -- it looks like np.median is currently broken
for np.matrix.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Requesting Code Review of nanmedian ENH

2014-02-16 Thread alex
On Sun, Feb 16, 2014 at 1:01 PM, David Freese dfre...@stanford.edu wrote:
 the 0s put into the array copy arr are not used in computation.  The
 _replace_nan call is used primarily to generate a mask of the NaNs and make
 sure it passes the mutation test.  I updated the unit tests to reflect
 negative values, which works.  (and the documentation should be cleaned up
 now)

 https://github.com/dfreese/numpy/compare/master...feature;nanmedian

It doesn't deal with numpy.matrix in the same way as numpy.nanmean.  For example


 import numpy as np
 m = np.matrix([[np.nan, np.nan, 1]])
 np.isscalar(np.nanmean(m))
True
 np.isscalar(np.nanmedian(m))
False


Some of the nanfunctions.py code has comments regarding carefulness in
dealing with subclasses of numpy.ndarray (like numpy.matrix), and some
of the nanfunctions tests include tests for this kind of behavior.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestions for GSoC Projects

2014-02-16 Thread Jennifer stone
On Wed, Feb 12, 2014 at 2:11 AM, Pauli Virtanen p...@iki.fi wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi

 It's not so often someone wants to work on scipy.special, so you'd be
 welcome to improve it :)

 That's great! Thanks a lot for your guidance. .


 -

 Spherical harmonics might be a reasonable part of a GSoC proposal.
 However, note that there exists also a *second* Legendre polynomial
 function `lpmv`, which doesn't store the values of the previous N
 functions. There's one numerical problem in the current way of
 evaluation via ~Pmn(cos(theta)), which is that this approach seems to
 lose relatively much precision at large orders for certain values of
 theta. I don't recall now exactly how imprecise it becomes at large
 orders, but it may be necessary to check.

 I checked lpmv and lpmn. As you rightly said, lpmv avoids the storage of
values small N's by using recursion. Why can't we first check if m and n are
n positive integers and pass them to lpmv itself rather than lpmn? lpmn does
give us the derivatives too, but sph_harm has no need for that, and of
course
all the values for m and n too are trimmed. What is the benefit of lpmn
over lpmv at present?


 Adding new special functions also sounds like an useful project. Here,
 it helps if they are something that you expect you will need later on :)


I am unable to think beyond ellipsoidal functions. As for their use, we as
students used them in problems of thermal equilibrium in ellipsoidal bodies,
and some scattering cases. Though I have no idea if its use in general is
quite prominent. And cylindrical harmonic function would be useful but I
feel
it's quite straight forward to implement (correct me if I am wrong) .



 There's also the case that several of the functions in Scipy have only
 implementations for real-valued inputs, although the functions would
 be defined on the whole complex plane. A list of the situation is here:

 https://github.com/scipy/scipy/blob/master/scipy/special
 /generate_ufuncs.py#L85

 Lowercase d correspond to real-valued implementations, uppercase D to
 complex-valued. I'm not at the moment completely sure which would have
 the highest priority --- whether you need this or not really depends
 on the application.


 If you want additional ideas about possible things to fix in
 scipy.special, take a look at this file:

 https://github.com/scipy/scipy/blob/master/scipy/special/tests
 /test_mpmath.py#L648

 The entries marked @knownfailure* have some undiagnosed issues in the
 implementation, which might be useful to look into. However: most of
 these have to do with corner cases in hypergeometric functions. Trying
 to address those is likely a risky GSoC topic, as the multi-argument
 hyp* functions are challenging to evaluate in floating point. (mpmath
 and Mathematica can evaluate them in most parameter regimes, but AFAIK
 both require arbitrary-precision methods for this.)


Yeah, many of the known failures seem to revolve around hyp2f1. An
unexplained
inclination towards hypergeometric functions really tempts me to plunge
into this.
If it's too risky, I can work on this after the summers, as I would have
gained quite
a lot of experience with the code here.

 So I think there would be a large number of possible things to do
 here, and help would be appreciated.

 - --
 Pauli Virtanen
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.14 (GNU/Linux)

 iEYEARECAAYFAlL6iwAACgkQ6BQxb7O0pWBfOgCfYHAB12N4FWDmrqx8/ORTBRps
 pXYAoL3ufAiShe+0qTEGfEvrmDgr1X0p
 =kAwF
 -END PGP SIGNATURE-


Regard
Jennifer
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: Scipy 0.13.3 release

2014-02-16 Thread Ralf Gommers
Binaries are now available on SourceForge.

Ralf



On Tue, Feb 4, 2014 at 8:16 AM, Ralf Gommers ralf.gomm...@gmail.com wrote:

 Hi,

 I'm happy to announce the availability of the scipy 0.13.3 release. This
 is a bugfix only release; it contains fixes for regressions in ndimage and
 weave.

 Source tarballs can be found at
 https://sourceforge.net/projects/scipy/files/scipy/0.13.3/ and on PyPi.
 Release notes copied below, binaries will follow later (the regular build
 machine is not available for the next two weeks).

 Cheers,
 Ralf



 ==
 SciPy 0.13.3 Release Notes
 ==

 SciPy 0.13.3 is a bug-fix release with no new features compared to 0.13.2.
 Both the weave and the ndimage.label bugs were severe regressions in
 0.13.0,
 hence this release.

 Issues fixed
 
 - 3148: fix a memory leak in ``ndimage.label``.
 - 3216: fix weave issue with too long file names for MSVC.

 Other changes
 -
 - Update Sphinx theme used for html docs so  in examples can be
 toggled.

 Checksums
 =
 0547c1f8e8afad4009cc9b5ef17a2d4d  release/installers/scipy-0.13.3.tar.gz
 20ff3a867cc5925ef1d654aed2ff7e88  release/installers/scipy-0.13.3.zip

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] argsort speed

2014-02-16 Thread josef . pktd
currently using numpy 1.6.1

What's the fastest argsort for a 1d array with around 28 Million
elements, roughly uniformly distributed, random order?

Is there a reason that np.argsort is almost 3 times slower than np.sort?

I'm doing semi-systematic timing for a stats(models) algorithm.

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread Eelco Hoogendoorn

My guess;

First of all, you are actually manipulating twice as much data as opposed to 
an inplace sort.

Moreover, an inplace sort gains locality as it is being sorted, whereas the 
argsort is continuously making completely random memory accesses.


-Original Message- 
From: josef.p...@gmail.com
Sent: Sunday, February 16, 2014 11:43 PM
To: Discussion of Numerical Python
Subject: [Numpy-discussion] argsort speed

currently using numpy 1.6.1

What's the fastest argsort for a 1d array with around 28 Million
elements, roughly uniformly distributed, random order?

Is there a reason that np.argsort is almost 3 times slower than np.sort?

I'm doing semi-systematic timing for a stats(models) algorithm.

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion 

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread Daπid
On 16 February 2014 23:43, josef.p...@gmail.com wrote:

 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?


On numpy latest version:

for kind in ['quicksort', 'mergesort', 'heapsort']:
print kind
%timeit np.sort(data, kind=kind)
%timeit np.argsort(data, kind=kind)


quicksort
1 loops, best of 3: 3.55 s per loop
1 loops, best of 3: 10.3 s per loop
mergesort
1 loops, best of 3: 4.84 s per loop
1 loops, best of 3: 9.49 s per loop
heapsort
1 loops, best of 3: 12.1 s per loop
1 loops, best of 3: 39.3 s per loop


It looks quicksort is quicker sorting, but mergesort is marginally faster
sorting args. The diference is slim, but upon repetition, it remains
significant.

Why is that? Probably part of the reason is what Eelco said, and part is
that in sort comparison are done accessing the array elements directly, but
in argsort you have to index the array, introducing some overhead.

I seem unable to find the code for ndarray.sort, so I can't check. I have
tried to grep it tring all possible combinations of def ndarray,
self.sort, etc. Where is it?


/David.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread Daπid
On 17 February 2014 00:12, Daπid davidmen...@gmail.com wrote:

 I seem unable to find the code for ndarray.sort, so I can't check. I have
 tried to grep it tring all possible combinations of def ndarray,
 self.sort, etc. Where is it?



Nevermind, it is in core/src/multiarray/methods.c
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread josef . pktd
On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn
hoogendoorn.ee...@gmail.com wrote:

 My guess;

 First of all, you are actually manipulating twice as much data as opposed to
 an inplace sort.

 Moreover, an inplace sort gains locality as it is being sorted, whereas the
 argsort is continuously making completely random memory accesses.


 -Original Message-
 From: josef.p...@gmail.com
 Sent: Sunday, February 16, 2014 11:43 PM
 To: Discussion of Numerical Python
 Subject: [Numpy-discussion] argsort speed

 currently using numpy 1.6.1

 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?

 Is there a reason that np.argsort is almost 3 times slower than np.sort?

 I'm doing semi-systematic timing for a stats(models) algorithm.

I was using np.sort, inplace sort is only a little bit faster

It looks like sorting first, and then argsorting is faster than argsort alon

pvals.sort()
sortind = np.argsort(pvals)

replacing the inplace sort in the above reduces speed only a bit

--
import time
use_master = True
if use_master:
import sys
sys.path.insert(0,
rE:\Josef\!reps\numpy\dist\Programs\Python27\Lib\site-packages)

import numpy as np
print np.__version__ =, np.__version__

n = 5300
pvals = np.random.rand(n**2)

t0 = time.time()

p = np.sort(pvals)
t1 = time.time()

sortind = np.argsort(pvals)
t2 = time.time()

pvals.sort()
sortind = np.argsort(pvals)
t3 = time.time()

print t1 - t0, t2 - t1, t3 - t2
print (t2 - t1) * 1. / (t1 - t0),  (t3 - t2) * 1. / (t1 - t0)


np.__version__ = 1.9.0.dev-2868dc4
3.91900014877 9.556218 4.92900013924
2.43863219163 1.2577187936

Josef



 Josef
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread josef . pktd
On Sun, Feb 16, 2014 at 6:15 PM,  josef.p...@gmail.com wrote:
 On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn
 hoogendoorn.ee...@gmail.com wrote:

 My guess;

 First of all, you are actually manipulating twice as much data as opposed to
 an inplace sort.

 Moreover, an inplace sort gains locality as it is being sorted, whereas the
 argsort is continuously making completely random memory accesses.


 -Original Message-
 From: josef.p...@gmail.com
 Sent: Sunday, February 16, 2014 11:43 PM
 To: Discussion of Numerical Python
 Subject: [Numpy-discussion] argsort speed

 currently using numpy 1.6.1

 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?

 Is there a reason that np.argsort is almost 3 times slower than np.sort?

 I'm doing semi-systematic timing for a stats(models) algorithm.

 I was using np.sort, inplace sort is only a little bit faster

 It looks like sorting first, and then argsorting is faster than argsort alon

 pvals.sort()
 sortind = np.argsort(pvals)

Ok, that was useless, that won't be anything I want.

Josef



 replacing the inplace sort in the above reduces speed only a bit

 --
 import time
 use_master = True
 if use_master:
 import sys
 sys.path.insert(0,
 rE:\Josef\!reps\numpy\dist\Programs\Python27\Lib\site-packages)

 import numpy as np
 print np.__version__ =, np.__version__

 n = 5300
 pvals = np.random.rand(n**2)

 t0 = time.time()

 p = np.sort(pvals)
 t1 = time.time()

 sortind = np.argsort(pvals)
 t2 = time.time()

 pvals.sort()
 sortind = np.argsort(pvals)
 t3 = time.time()

 print t1 - t0, t2 - t1, t3 - t2
 print (t2 - t1) * 1. / (t1 - t0),  (t3 - t2) * 1. / (t1 - t0)
 

 np.__version__ = 1.9.0.dev-2868dc4
 3.91900014877 9.556218 4.92900013924
 2.43863219163 1.2577187936

 Josef



 Josef
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Suggestions for GSoC Projects

2014-02-16 Thread Pauli Virtanen
16.02.2014 23:34, Jennifer stone kirjoitti:
[clip]
 Yeah, many of the known failures seem to revolve around hyp2f1. An 
 unexplained inclination towards hypergeometric functions really
 tempts me to plunge into this. If it's too risky, I can work on
 this after the summers, as I would have gained quite a lot of
 experience with the code here.

If you are interested in the hypergeometric numerical evaluation, it's
probably a good idea to take a look at this recent master's thesis
written on the problem:

http://people.maths.ox.ac.uk/porterm/research/pearson_final.pdf

This may give some systematic overview on the range of methods
available. (Note that for copyright reasons, it's not a good idea to
look closely at the source codes linked from that thesis, as they are
not available under a compatible license.)

It may well be that the best approach for evaluating these functions,
if accuracy in the whole parameter range is wanted, in the end turns
out to require arbitrary-precision computations.  In that case, it
would be a very good idea to look at how the problem is approached in
mpmath. There are existing multiprecision packages written in C, and
using one of them in scipy.special could bring better evaluation
performance even if the algorithm is the same.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread josef . pktd
On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote:
 On 16 February 2014 23:43, josef.p...@gmail.com wrote:

 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?


 On numpy latest version:

 for kind in ['quicksort', 'mergesort', 'heapsort']:
 print kind
 %timeit np.sort(data, kind=kind)
 %timeit np.argsort(data, kind=kind)


 quicksort
 1 loops, best of 3: 3.55 s per loop
 1 loops, best of 3: 10.3 s per loop
 mergesort
 1 loops, best of 3: 4.84 s per loop
 1 loops, best of 3: 9.49 s per loop
 heapsort
 1 loops, best of 3: 12.1 s per loop
 1 loops, best of 3: 39.3 s per loop


 It looks quicksort is quicker sorting, but mergesort is marginally faster
 sorting args. The diference is slim, but upon repetition, it remains
 significant.

 Why is that? Probably part of the reason is what Eelco said, and part is
 that in sort comparison are done accessing the array elements directly, but
 in argsort you have to index the array, introducing some overhead.

Thanks, both.

I also gain a second with mergesort.

matlab would be nicer in my case, it returns both.
I still need to use the argsort to index into the array to also get
the sorted array.

Josef



 I seem unable to find the code for ndarray.sort, so I can't check. I have
 tried to grep it tring all possible combinations of def ndarray,
 self.sort, etc. Where is it?


 /David.


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread Charles R Harris
On Sun, Feb 16, 2014 at 4:18 PM, josef.p...@gmail.com wrote:

 On Sun, Feb 16, 2014 at 6:15 PM,  josef.p...@gmail.com wrote:
  On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn
  hoogendoorn.ee...@gmail.com wrote:
 
  My guess;
 
  First of all, you are actually manipulating twice as much data as
 opposed to
  an inplace sort.
 
  Moreover, an inplace sort gains locality as it is being sorted, whereas
 the
  argsort is continuously making completely random memory accesses.
 
 
  -Original Message-
  From: josef.p...@gmail.com
  Sent: Sunday, February 16, 2014 11:43 PM
  To: Discussion of Numerical Python
  Subject: [Numpy-discussion] argsort speed
 
  currently using numpy 1.6.1
 
  What's the fastest argsort for a 1d array with around 28 Million
  elements, roughly uniformly distributed, random order?
 
  Is there a reason that np.argsort is almost 3 times slower than np.sort?
 
  I'm doing semi-systematic timing for a stats(models) algorithm.
 
  I was using np.sort, inplace sort is only a little bit faster
 
  It looks like sorting first, and then argsorting is faster than argsort
 alon
 
  pvals.sort()
  sortind = np.argsort(pvals)

 Ok, that was useless, that won't be anything I want.


I think locality is the most important thing. The argsort routines used to
move both the indexes and the array to argsort, the new ones only move the
indexes. It is a tradeoff, twice as many moves vs locality. It's probably
possible to invent an algorithm that mixes the two.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread josef . pktd
On Sun, Feb 16, 2014 at 7:13 PM, Charles R Harris
charlesr.har...@gmail.com wrote:



 On Sun, Feb 16, 2014 at 4:18 PM, josef.p...@gmail.com wrote:

 On Sun, Feb 16, 2014 at 6:15 PM,  josef.p...@gmail.com wrote:
  On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn
  hoogendoorn.ee...@gmail.com wrote:
 
  My guess;
 
  First of all, you are actually manipulating twice as much data as
  opposed to
  an inplace sort.
 
  Moreover, an inplace sort gains locality as it is being sorted, whereas
  the
  argsort is continuously making completely random memory accesses.
 
 
  -Original Message-
  From: josef.p...@gmail.com
  Sent: Sunday, February 16, 2014 11:43 PM
  To: Discussion of Numerical Python
  Subject: [Numpy-discussion] argsort speed
 
  currently using numpy 1.6.1
 
  What's the fastest argsort for a 1d array with around 28 Million
  elements, roughly uniformly distributed, random order?
 
  Is there a reason that np.argsort is almost 3 times slower than
  np.sort?
 
  I'm doing semi-systematic timing for a stats(models) algorithm.
 
  I was using np.sort, inplace sort is only a little bit faster
 
  It looks like sorting first, and then argsorting is faster than argsort
  alon
 
  pvals.sort()
  sortind = np.argsort(pvals)

 Ok, that was useless, that won't be anything I want.


 I think locality is the most important thing. The argsort routines used to
 move both the indexes and the array to argsort, the new ones only move the
 indexes. It is a tradeoff, twice as many moves vs locality. It's probably
 possible to invent an algorithm that mixes the two.

If that's the way it is, then that's the way it is.

I just never realized that argsort can take so long, since I usually
use only smaller arrays.
I was surprised that argsort took almost 10 out of around 12 seconds
in my function, and this after I cleaned up my code and removed
duplicate and avoidable argsorts.

Josef



 snip

 Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-16 Thread Charles R Harris
On Sun, Feb 16, 2014 at 4:12 PM, Daπid davidmen...@gmail.com wrote:

 On 16 February 2014 23:43, josef.p...@gmail.com wrote:

 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?


 On numpy latest version:

 for kind in ['quicksort', 'mergesort', 'heapsort']:
 print kind
 %timeit np.sort(data, kind=kind)
 %timeit np.argsort(data, kind=kind)


 quicksort
 1 loops, best of 3: 3.55 s per loop
 1 loops, best of 3: 10.3 s per loop
 mergesort
 1 loops, best of 3: 4.84 s per loop
 1 loops, best of 3: 9.49 s per loop
 heapsort
 1 loops, best of 3: 12.1 s per loop
 1 loops, best of 3: 39.3 s per loop


Interesting. I get slightly different results

quicksort
1 loops, best of 3: 3.25 s per loop
1 loops, best of 3: 6.16 s per loop
mergesort
1 loops, best of 3: 3.99 s per loop
1 loops, best of 3: 6.97 s per loop
heapsort
1 loops, best of 3: 10.1 s per loop
1 loops, best of 3: 29.3 s per loop

Possibly faster memory here.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion