Using Eric's latest speed-testing, here's David's results: [EMAIL PROTECTED]:~/code_snippets/histogram$ python histogram_speed.py type: uint8 millions of elements: 100.0 sec (C indexing based): 8.44 100000000 sec (numpy iteration based): 8.91 100000000 sec (rick's pure python): 6.4 100000000 sec (nd evenly spaced): 2.1 100000000 sec (1d evenly spaced): 1.33 100000000 sec (david huard): 35.84 100000000
Summary: case sec speed-up weave_1d_arbitrary 8.440000 0.758294 weave_nd_arbitrary 8.910000 0.718294 ricks_arbitrary 6.400000 1.000000 weave_nd_even 2.100000 3.047619 weave_1d_even 1.330000 4.812030 david_huard 35.840000 0.178571 I also tried this on an equal-sized sample of my real-world data: 100 image slices, 8bits/sample, 1000x1000 pixels per image. The full data set is 489 image slices, but I was unable to randomly generate 489 million data samples because I ran out of memory and started thrashing the page file, ruining any results. So I've compared like with like and got the following results with real-world data: type: uint8 millions of elements: 100.0 sec (C indexing based): 6.1 100000000 sec (numpy iteration based): 7.07 100000000 sec (rick's pure python): 4.77 100000000 sec (nd evenly spaced): 2.12 100000000 sec (1d evenly spaced): 1.33 100000000 sec (david huard): 16.47 100000000 Summary: case sec speed-up weave_1d_arbitrary 6.100000 0.781967 weave_nd_arbitrary 7.070000 0.674682 ricks_arbitrary 4.770000 1.000000 weave_nd_even 2.120000 2.250000 weave_1d_even 1.330000 3.586466 david_huard 16.470000 0.289617 Note how much faster some of the algorithms run on the non-random, real-world data. I assume this is due to variations in the scaling of the quick-sort algorithm depending on the starting order of the data? Scaling with the full data set was similar. Unfortunately, David's code was not able to load the entire 489 image slices, throwing the same error as that mentioned in the first email in this thread. Later parts of the project I am working on will probably require iteration over the entire data set, and iteration seems to be slowing down several of these histogram algorithms, requiring the sort() approach. I'll have a look at the iterator, and see if there's anything that can be done there instead. I'm hoping that it will be possible to use a C-based iterator for a numpy multiarray, as this would allow many data processing algorithms to run faster, not just the histogram. Once again, thanks to everyone for all your input. This seems to have generated more discussion and action than I anticipated, for which I am very grateful. Best regards, Cameron. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion