Python sorting 10000 records of 10000 floats for each record,  finding the max, 
min, and mean of entire 100,000,000 32 bit float array (400 MB) on a 6 year old 
white imac.

     *11.6 seconds.

*This doesn't include the time to generate the 400 MB of random (normal) data.

Try it on your own computer. Here's the copy-paste from mine:

py> import timeit
py> timeit.timeit('big_data.sort(axis=0), big_data.mean(); big_data.max(); 
big_data.min();',
                             'import numpy; big_data=numpy.random.normal(10, 
size=1e8).reshape((1e4,1e4)); print "random data made, starting..."',
                             number=1)
random data made, starting...
    11.597978115081787

James




On Sep 12, 2012, at 8:32 AM, Jacob Keller wrote:

> Dear List,
> 
> since this probably comes up a lot in manipulation of pdb/reflection files 
> and so on, I was curious what people thought would be the best language for 
> the following: I have some huge (100s MB) tables of tab-delimited data on 
> which I would like to do some math (averaging, sigmas, simple arithmetic, 
> etc) as well as some sorting and rejecting. It can be done in Excel, but this 
> is exceedingly slow even in 64-bit, so I am looking to do it through some 
> scripting. Just as an example, a "sort" which takes >10 min in Excel takes 
> ~10 sec max with the unix command sort (seems crazy, no?). Any suggestions?
> 
> Thanks, and sorry for being off-topic,
> 
> Jacob
> 
> -- 
> *******************************************
> Jacob Pearson Keller
> Northwestern University
> Medical Scientist Training Program
> email: [email protected]
> *******************************************

Reply via email to