A Saturday 09 February 2008, Charles R Harris escrigué: > I'm also thinking that at some point it becomes more efficient to do > a indirect sort followed by take than to move all those big strings > around. But I guess we won't know where that point is until we have > both versions available.
I've done some experiments in that matter too. They are saying that, with the current mergesort in NumPy, an indirect sort followed by take performs similarly to direct sort for small string lengths (<=16), but indirect sort starts to win afterwards. The version with quicksort and optimized sSWAP should be between 2x and 3x times faster than current mergesort implementation, so the advantage for direct sort could grow up to somewhere between 50 and 100. A nice idea could be doing some more toughful experiments in order to find the point where an indirect sort followed by a take would be more efficient and automatically select this method beyond that point. However, this has the drawback that you have to use additional memory for keeping the indices in the indirect method. Of course, when strings are large, those indices should take a rather negligible space compared with strings itself. In any case, in some situations where space is critical, this can still be important. I don't know, but my opinion is that we shouldn't take too agressive optimizations for that matter. My vote is to document this possibility in the docstrings, so that the user wanting for extreme performance can use this approach if he wants to. Still, for string sizes greater than, say, 1000, well, an automatic selection of the indirect method is very tempting indeed. Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-" _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion