On 02/11/2012 04:20 PM, Peng Yu wrote: > Hi, > > I assume the time complexity of 'sort' is log N, where N is the input size. > > But I'm not familiar with 'sort' enough to tell the complexity of > sorting a nearly sorted input. Suppose that I have a listed of N > numbers, there only k numbers (k << N, say k=N/100) that are not in > the correct position and all the other N-k numbers are in the correct > position. Does anybody knows the time complexity of this case, or it > is still log N? >
Well discounting all the memory, thread and data type management that `sort` gives you, centrally there is this description on sequential_sort(): Use a recursive divide-and-conquer algorithm, in the style suggested by Knuth volume 3 (2nd edition), exercise 5.2.4-23. Use the optimization suggested by exercise 5.2.4-10; this requires room for only 1.5*N lines, rather than the usual 2*N lines. Knuth writes that this memory optimization was originally published by D. A. Bell, Comp J. 1 (1958), 75. I searched for the above text out of interest, and noticed this good overview of the operation of `sort`: http://philip.greenspun.com/software/abstraction-filtration-comparison/sort/ It's trivial to empirically measure how sort behaves with different inputs: $ for i in $(seq 4 7); do > seq -w $((10**$i)) > sorted.$i > shuf < sorted.$i > unsorted.$i > done $ export OMP_NUM_THREADS=1 $ export LANG=C $ for i in $(seq 4 7); do > time sort sorted.$i > /dev/null > time sort unsorted.$i > /dev/null > done We can also compare python's in place "timsort", given that the data set fits in RAM and timsort is said to be especially good for mostly sorted data: $ for type in sorted unsorted; do > time python -c " > import sys > a=sys.stdin.readlines() > a.sort() > sys.stdout.writelines(a) > " > /dev/null < $type.7 > done Results: N sorted unsorted py_sorted py_unsorted 10000 0.015 0.021 100000 0.058 0.088 1000000 0.522 1.105 10000000 7.020 15.644 2.882 26.285 cheers, Pádraig.
