Re: anybody love qsort.c?
Apparently, someone does love qsort. Akira Wada wrote: | I think it would be difficult, not impossible, to decide the threshold | (Christopher Seiwald said, "# swaps = 1024", and Tim Vanderhoek suggested, | "a ratio: #comparisons : # swaps") and to what point of qsort, to make | isort bail back, effectively, without troubles, and covering every situations. The value 1024 is arbitrary, to be sure, but it is in a very safe range: at 0, the BSD addition of isort to Bentley's qsort is left intact; at infinity, it is entirely disabled. Any number in between will exhibit at worst the behavior of one of the two. The number selection doesn't have to be fancy: as Akira pointed out, my problem seems to be a fringe case that no one else has ever noted. I am therefore more or less selecting the # for me and my data. | Then an improvement would be : | (1) delete the switching to "isort by swap_cnt", | and add the routine detecting sorted completely. Actually, if the data is sorted then the isort does just as many comparisons (and the same zero swaps) as a sort detection routine. | (2) adding (1), make the partitioning logic symmetrical, | for dataset sorted in reversed order to be processed efficiently, | e.g. malloc(size) for pivot instead of "swap(a, pm)" | and some modifications required by this. This starts to get like real code. My intent is not to reengineer qsort, or at all challenge Bentley's algorithm or BSD's addition, but rather temper the latter with a safe, understandable (for me) alteration. By the way, thanks (Akira) for your very nice depiction of the problem of partially sorted data. Christopher To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: anybody love qsort.c?
Apparently, someone does love qsort. Akira Wada wrote: | I think it would be difficult, not impossible, to decide the threshold | (Christopher Seiwald said, # swaps = 1024, and Tim Vanderhoek suggested, | a ratio: #comparisons : # swaps) and to what point of qsort, to make | isort bail back, effectively, without troubles, and covering every situations. The value 1024 is arbitrary, to be sure, but it is in a very safe range: at 0, the BSD addition of isort to Bentley's qsort is left intact; at infinity, it is entirely disabled. Any number in between will exhibit at worst the behavior of one of the two. The number selection doesn't have to be fancy: as Akira pointed out, my problem seems to be a fringe case that no one else has ever noted. I am therefore more or less selecting the # for me and my data. | Then an improvement would be : | (1) delete the switching to isort by swap_cnt, | and add the routine detecting sorted completely. Actually, if the data is sorted then the isort does just as many comparisons (and the same zero swaps) as a sort detection routine. | (2) adding (1), make the partitioning logic symmetrical, | for dataset sorted in reversed order to be processed efficiently, | e.g. malloc(size) for pivot instead of swap(a, pm) | and some modifications required by this. This starts to get like real code. My intent is not to reengineer qsort, or at all challenge Bentley's algorithm or BSD's addition, but rather temper the latter with a safe, understandable (for me) alteration. By the way, thanks (Akira) for your very nice depiction of the problem of partially sorted data. Christopher To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
Christopher Seiwald wrote... Archie's mod to qsort: | - if (swap_cnt == 0) { /* Switch to insertion sort */ | + if (n = 32 swap_cnt == 0) { /* Switch to insertion sort */ As Akira Wada points out, this eliminates the benefit of the optimization in the first place, which is to let isort take over if the data is largely sorted in the first place. Akira Wada's alternative mod: | + pl = (char *)a; pn = (char *)a + (n - 1) * es; | + while (pl pn cmp(pl, pl + es) = 0) pl += es; | + if (pl = pn) return; This is a little more comprehensive, but does throw in an extra pass of comparisons, which (I'm sure) someone would complain about. The alteration that I've tried and tested is to have the isort bail back to qsort if it does more than N swaps. I put N at 1024, which worked for my data :).This is almost guaranteed to do no more work than the current logic, because it it only gives up on the isort when it has evidence that the isort is doing too much work. I think it would be difficult, not impossible, to decide the threshold (Christopher Seiwald said, "# swaps = 1024", and Tim Vanderhoek suggested, "a ratio: #comparisons : # swaps") and to what point of qsort, to make isort bail back, effectively, without troubles, and covering every situations. I considered the cases (swap_cnt == 0) is true, and they could be illustrated by figure [1],[2],[3] below. [1] |: ( 2 subarrays are randomized ) |: This is the worst case pointed out |: by Christopher Seiwald, and so |: isort has to work twice for random k |: arrays of n/2 elements, although e +* probability of this case might be y |: very very small in real life. |: Because of this low probability, |: I guess, FreeBSD qsort had been left |: as it is, for about 10 years since the |: 1'st issue, even though someone wuold +-+--- have been already aware of the thing. position [2] |: * (sorted almost) |: *This is the case isort is expected to | (X): * work efficiently, but if any elements |: *exist in region (X) and/or (Y), k |: "swap_cnt" could not detect the advantages. e +* And probability (X) and (Y) are empty, y |: might be very small, at least, in the | * : practical applications. This case might |* :(Y) occur when a lower significant key is | * : added for the dataset sorted already |* : with some higher keys (same keys exsist), ++ etc.. position [3] |: * (sorted completely) |: * Therefor, usefulness of "swap_cnt" is |: * guaranteed only for this case. |: * And so, it would be worth while to detect k |: * the array sorted completely, without risk e +* to degenerate, instead of "swap_cnt and isort", y | * : and the test loop is not so expensive for |* : usual arrangements because of breaking | * : out from loop at the 1'st pair out of order, | *: and at worst n - 2 comparisons. | * : ++ position Then an improvement would be : (1) delete the switching to "isort by swap_cnt", and add the routine detecting sorted completely. (2) adding (1), make the partitioning logic symmetrical, for dataset sorted in reversed order to be processed efficiently, e.g. malloc(size) for pivot instead of "swap(a, pm)" and some modifications required by this. By the way, I am doubting yet the efficiency of isort for small subarrays in quicksort, so I'd made some experiments for this and the results at ; http://www.mars.dti.ne.jp/~a-wada/qsortlib/ccn/qsortins.c http://www.mars.dti.ne.jp/~a-wada/qsortlib/ccn/qsortins.txt so that I didn't use isort in my qsorts.c at my 1'st post, and in the other codes for practical usages, at ; http://www.mars.dti.ne.jp/~a-wada/qsortlib.html -- Akira Wada To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: anybody love qsort.c?
Christopher Seiwald wrote... Archie's mod to qsort: | - if (swap_cnt == 0) { /* Switch to insertion sort */ | + if (n = 32 swap_cnt == 0) { /* Switch to insertion sort */ As Akira Wada points out, this eliminates the benefit of the optimization in the first place, which is to let isort take over if the data is largely sorted in the first place. Akira Wada's alternative mod: | + pl = (char *)a; pn = (char *)a + (n - 1) * es; | + while (pl pn cmp(pl, pl + es) = 0) pl += es; | + if (pl = pn) return; This is a little more comprehensive, but does throw in an extra pass of comparisons, which (I'm sure) someone would complain about. The alteration that I've tried and tested is to have the isort bail back to qsort if it does more than N swaps. I put N at 1024, which worked for my data :).This is almost guaranteed to do no more work than the current logic, because it it only gives up on the isort when it has evidence that the isort is doing too much work. I think it would be difficult, not impossible, to decide the threshold (Christopher Seiwald said, # swaps = 1024, and Tim Vanderhoek suggested, a ratio: #comparisons : # swaps) and to what point of qsort, to make isort bail back, effectively, without troubles, and covering every situations. I considered the cases (swap_cnt == 0) is true, and they could be illustrated by figure [1],[2],[3] below. [1] |: ( 2 subarrays are randomized ) |: This is the worst case pointed out |: by Christopher Seiwald, and so |: isort has to work twice for random k |: arrays of n/2 elements, although e +* probability of this case might be y |: very very small in real life. |: Because of this low probability, |: I guess, FreeBSD qsort had been left |: as it is, for about 10 years since the |: 1'st issue, even though someone wuold +-+--- have been already aware of the thing. position [2] |: * (sorted almost) |: *This is the case isort is expected to | (X): * work efficiently, but if any elements |: *exist in region (X) and/or (Y), k |: swap_cnt could not detect the advantages. e +* And probability (X) and (Y) are empty, y |: might be very small, at least, in the | * : practical applications. This case might |* :(Y) occur when a lower significant key is | * : added for the dataset sorted already |* : with some higher keys (same keys exsist), ++ etc.. position [3] |: * (sorted completely) |: * Therefor, usefulness of swap_cnt is |: * guaranteed only for this case. |: * And so, it would be worth while to detect k |: * the array sorted completely, without risk e +* to degenerate, instead of swap_cnt and isort, y | * : and the test loop is not so expensive for |* : usual arrangements because of breaking | * : out from loop at the 1'st pair out of order, | *: and at worst n - 2 comparisons. | * : ++ position Then an improvement would be : (1) delete the switching to isort by swap_cnt, and add the routine detecting sorted completely. (2) adding (1), make the partitioning logic symmetrical, for dataset sorted in reversed order to be processed efficiently, e.g. malloc(size) for pivot instead of swap(a, pm) and some modifications required by this. By the way, I am doubting yet the efficiency of isort for small subarrays in quicksort, so I'd made some experiments for this and the results at ; http://www.mars.dti.ne.jp/~a-wada/qsortlib/ccn/qsortins.c http://www.mars.dti.ne.jp/~a-wada/qsortlib/ccn/qsortins.txt so that I didn't use isort in my qsorts.c at my 1'st post, and in the other codes for practical usages, at ; http://www.mars.dti.ne.jp/~a-wada/qsortlib.html -- Akira Wada To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
[EMAIL PROTECTED] (John-Mark Gurney) writes: Christopher Seiwald scribbled this message on Aug 18: It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. why don't you implement this w/ the 5 element median selection qsort algorithm? my professor for cis413 talked about this algorithm and that it really is the fastest qsort algorithm... I don't have any pointers to a paper on this... but I might be able to dig some info up on it if you are interested... I don't think the point is eliminating worst-cases, but optimizing common cases, which in this case caused more worst-cases and thus needs fixing. Besides, the median selection chooses among more than 3 elements already (but only when the data set is large enough). For fixing worst cases, an introspective sort might be a good idea, i.e. do a quick sort but fall back to heap sort if a certain depth is exceeded (you know you're losing when the depth exceeds log n). This also has another advantage - if you limit the depth of the sort, you don't need to use the cpu stack for state, you can allocate a fixed-size array for the purpose. This probably isn't a real performance advantage for a C qsort implementation because of the overhead of calling cmp. It does, however, guarantee that sorting uses a reasonable amount of stack. Such an assumption isn't portable when using qsort(3), though. Expect to die if you do large qsorts from threads with small thread stacks. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: anybody love qsort.c?
On Mon, Aug 23, 1999 at 12:28:32AM -0700, Christopher Seiwald wrote: The alteration that I've tried and tested is to have the isort bail back to qsort if it does more than N swaps. I put N at 1024, which Perhaps a ratio: #comparisons : # swaps If the ratio gets too high, then bail. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: anybody love qsort.c?
Archie's mod to qsort: | - if (swap_cnt == 0) { /* Switch to insertion sort */ | + if (n = 32 swap_cnt == 0) { /* Switch to insertion sort */ As Akira Wada points out, this eliminates the benefit of the optimization in the first place, which is to let isort take over if the data is largely sorted in the first place. Akira Wada's alternative mod: | + pl = (char *)a; pn = (char *)a + (n - 1) * es; | + while (pl pn cmp(pl, pl + es) = 0) pl += es; | + if (pl = pn) return; This is a little more comprehensive, but does throw in an extra pass of comparisons, which (I'm sure) someone would complain about. The alteration that I've tried and tested is to have the isort bail back to qsort if it does more than N swaps. I put N at 1024, which worked for my data :).This is almost guaranteed to do no more work than the current logic, because it it only gives up on the isort when it has evidence that the isort is doing too much work. Christopher To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
gurne...@efn.org (John-Mark Gurney) writes: Christopher Seiwald scribbled this message on Aug 18: It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. why don't you implement this w/ the 5 element median selection qsort algorithm? my professor for cis413 talked about this algorithm and that it really is the fastest qsort algorithm... I don't have any pointers to a paper on this... but I might be able to dig some info up on it if you are interested... I don't think the point is eliminating worst-cases, but optimizing common cases, which in this case caused more worst-cases and thus needs fixing. Besides, the median selection chooses among more than 3 elements already (but only when the data set is large enough). For fixing worst cases, an introspective sort might be a good idea, i.e. do a quick sort but fall back to heap sort if a certain depth is exceeded (you know you're losing when the depth exceeds log n). This also has another advantage - if you limit the depth of the sort, you don't need to use the cpu stack for state, you can allocate a fixed-size array for the purpose. This probably isn't a real performance advantage for a C qsort implementation because of the overhead of calling cmp. It does, however, guarantee that sorting uses a reasonable amount of stack. Such an assumption isn't portable when using qsort(3), though. Expect to die if you do large qsorts from threads with small thread stacks. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
On Mon, Aug 23, 1999 at 12:28:32AM -0700, Christopher Seiwald wrote: The alteration that I've tried and tested is to have the isort bail back to qsort if it does more than N swaps. I put N at 1024, which Perhaps a ratio: #comparisons : # swaps If the ratio gets too high, then bail. -- This is my .signature which gets appended to the end of my messages. To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
Following my previous post: I wrote .. I believe a reversed dataset would be partitioned into two subpartitions sorted in order at the 1'st pass of the partitionings. Is this incorrect ? Sorry, I'd confirmed BSD qsort's partitioning logic does not guarantee that a reversed dataset would be partitioned into two subpartitions sorted in order at the 1'st pass of the partitionings. -Akira Wada To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
You can check the change by recompiling a few utils with the change: (find . -name \*.c | xargs grep -l qsort) ./bin/ps/ps.c ./contrib/gcc/*.c ./contrib/top/commands.c ./games/fortune/strfile/strfile.c ./gnu/usr.bin/sort/sort.c ./sbin/fsck/pass2.c The fsck one is a nice one. Just wack your /usr and reboot :-) Nick On Fri, 20 Aug 1999, Archie Cobbs wrote: Christopher Seiwald writes: But as I'm proposing a change to a fairly sensitive piece of code, I'd like to keep the change as modest as possible. How about this? Index: qsort.c === RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v retrieving revision 1.7 diff -u -r1.7 qsort.c --- qsort.c 1997/02/22 15:03:14 1.7 +++ qsort.c 1999/08/21 01:35:35 @@ -153,7 +153,7 @@ pb += es; pc -= es; } - if (swap_cnt == 0) { /* Switch to insertion sort */ + if (n = 32 swap_cnt == 0) { /* Switch to insertion sort */ for (pm = (char *)a + es; pm (char *)a + n * es; pm += es) for (pl = pm; pl (char *)a cmp(pl - es, pl) 0; pl -= es) -Archie ___ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message -- e-Mail: [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: anybody love qsort.c?
Archie Cobbs wrote... Christopher Seiwald writes: But as I'm proposing a change to a fairly sensitive piece of code, I'd like to keep the change as modest as possible. How about this? Index: qsort.c === RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v retrieving revision 1.7 diff -u -r1.7 qsort.c --- qsort.c 1997/02/22 15:03:14 1.7 +++ qsort.c 1999/08/21 01:35:35 @@ -153,7 +153,7 @@ pb += es; pc -= es; } - if (swap_cnt == 0) { /* Switch to insertion sort */ + if (n = 32 swap_cnt == 0) { /* Switch to insertion sort */ for (pm = (char *)a + es; pm (char *)a + n * es; pm += es) for (pl = pm; pl (char *)a cmp(pl - es, pl) 0; pl -= es) -Archie ___ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com I think your modification would avoid the degeneration indicated by Christopher Seiwald, but degrade the advantage for the dataset sorted completely or sorted in reversed order, down to nearly equal for random dataset. I added a routine before selecting pivot to test current partition sorted already and if so, to bypass partitioning. It works well for dataset sorted in order, but doesn't work for dataset in reversed order. I believe a reversed dataset would be partitioned into two subpartitions sorted in order at the 1'st pass of the partitionings. Is this incorrect ? -- for qsort.c,v 1.9 1998/06/30 11:05:11 @@ -102,2 +102,5 @@ swap_cnt = 0; + pl = (char *)a; pn = (char *)a + (n - 1) * es; + while (pl pn cmp(pl, pl + es) = 0) pl += es; + if (pl = pn) return; if (n 7) { -- -Akira Wada * $BOBED!!IK(B $B!?(B Akira Wada $BEl5~ETJ!@8;TIpB"LnBf(B 1-27-5 $B%k%MJ!@8(B B609 (tel,fax) 042-552-1143 E-mail : [EMAIL PROTECTED] * To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: anybody love qsort.c?
Following my previous post: I wrote .. I believe a reversed dataset would be partitioned into two subpartitions sorted in order at the 1'st pass of the partitionings. Is this incorrect ? Sorry, I'd confirmed BSD qsort's partitioning logic does not guarantee that "a reversed dataset would be partitioned into two subpartitions sorted in order at the 1'st pass of the partitionings". -Akira Wada To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: anybody love qsort.c?
You can check the change by recompiling a few utils with the change: (find . -name \*.c | xargs grep -l qsort) ./bin/ps/ps.c ./contrib/gcc/*.c ./contrib/top/commands.c ./games/fortune/strfile/strfile.c ./gnu/usr.bin/sort/sort.c ./sbin/fsck/pass2.c The fsck one is a nice one. Just wack your /usr and reboot :-) Nick On Fri, 20 Aug 1999, Archie Cobbs wrote: Christopher Seiwald writes: But as I'm proposing a change to a fairly sensitive piece of code, I'd like to keep the change as modest as possible. How about this? Index: qsort.c === RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v retrieving revision 1.7 diff -u -r1.7 qsort.c --- qsort.c 1997/02/22 15:03:14 1.7 +++ qsort.c 1999/08/21 01:35:35 @@ -153,7 +153,7 @@ pb += es; pc -= es; } - if (swap_cnt == 0) { /* Switch to insertion sort */ + if (n = 32 swap_cnt == 0) { /* Switch to insertion sort */ for (pm = (char *)a + es; pm (char *)a + n * es; pm += es) for (pl = pm; pl (char *)a cmp(pl - es, pl) 0; pl -= es) -Archie ___ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message -- e-Mail: hi...@skylink.it To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
Archie Cobbs wrote... Christopher Seiwald writes: But as I'm proposing a change to a fairly sensitive piece of code, I'd like to keep the change as modest as possible. How about this? Index: qsort.c === RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v retrieving revision 1.7 diff -u -r1.7 qsort.c --- qsort.c 1997/02/22 15:03:14 1.7 +++ qsort.c 1999/08/21 01:35:35 @@ -153,7 +153,7 @@ pb += es; pc -= es; } - if (swap_cnt == 0) { /* Switch to insertion sort */ + if (n = 32 swap_cnt == 0) { /* Switch to insertion sort */ for (pm = (char *)a + es; pm (char *)a + n * es; pm += es) for (pl = pm; pl (char *)a cmp(pl - es, pl) 0; pl -= es) -Archie I think your modification would avoid the degeneration indicated by Christopher Seiwald, but degrade the advantage for the dataset sorted completely or sorted in reversed order, down to nearly equal for random dataset. I added a routine before selecting pivot to test current partition sorted already and if so, to bypass partitioning. It works well for dataset sorted in order, but doesn't work for dataset in reversed order. I believe a reversed dataset would be partitioned into two subpartitions sorted in order at the 1'st pass of the partitionigs. Is this incorrect ? -- for qsort.c,v 1.9 1998/06/30 11:05:11 @@ -102,2 +102,5 @@ swap_cnt = 0; + pl = (char *)a; pn = (char *)a + (n - 1) * es; + while (pl pn cmp(pl, pl + es) pl += es; + if (pl = pn) return; if (n 7) { -- -Akira Wada To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
Archie Cobbs wrote... Christopher Seiwald writes: But as I'm proposing a change to a fairly sensitive piece of code, I'd like to keep the change as modest as possible. How about this? Index: qsort.c === RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v retrieving revision 1.7 diff -u -r1.7 qsort.c --- qsort.c 1997/02/22 15:03:14 1.7 +++ qsort.c 1999/08/21 01:35:35 @@ -153,7 +153,7 @@ pb += es; pc -= es; } - if (swap_cnt == 0) { /* Switch to insertion sort */ + if (n = 32 swap_cnt == 0) { /* Switch to insertion sort */ for (pm = (char *)a + es; pm (char *)a + n * es; pm += es) for (pl = pm; pl (char *)a cmp(pl - es, pl) 0; pl -= es) -Archie ___ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com I think your modification would avoid the degeneration indicated by Christopher Seiwald, but degrade the advantage for the dataset sorted completely or sorted in reversed order, down to nearly equal for random dataset. I added a routine before selecting pivot to test current partition sorted already and if so, to bypass partitioning. It works well for dataset sorted in order, but doesn't work for dataset in reversed order. I believe a reversed dataset would be partitioned into two subpartitions sorted in order at the 1'st pass of the partitionings. Is this incorrect ? -- for qsort.c,v 1.9 1998/06/30 11:05:11 @@ -102,2 +102,5 @@ swap_cnt = 0; + pl = (char *)a; pn = (char *)a + (n - 1) * es; + while (pl pn cmp(pl, pl + es) = 0) pl += es; + if (pl = pn) return; if (n 7) { -- -Akira Wada * 和田 彬 / Akira Wada 東京都福生市武蔵野台 1-27-5 ルネ福生 B609 (tel,fax) 042-552-1143 E-mail : a-w...@mars.dti.ne.jp * To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
Christopher Seiwald writes: But as I'm proposing a change to a fairly sensitive piece of code, I'd like to keep the change as modest as possible. How about this? Index: qsort.c === RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v retrieving revision 1.7 diff -u -r1.7 qsort.c --- qsort.c 1997/02/22 15:03:14 1.7 +++ qsort.c 1999/08/21 01:35:35 @@ -153,7 +153,7 @@ pb += es; pc -= es; } - if (swap_cnt == 0) { /* Switch to insertion sort */ + if (n = 32 swap_cnt == 0) { /* Switch to insertion sort */ for (pm = (char *)a + es; pm (char *)a + n * es; pm += es) for (pl = pm; pl (char *)a cmp(pl - es, pl) 0; pl -= es) -Archie ___ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: anybody love qsort.c?
Christopher Seiwald writes: But as I'm proposing a change to a fairly sensitive piece of code, I'd like to keep the change as modest as possible. How about this? Index: qsort.c === RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v retrieving revision 1.7 diff -u -r1.7 qsort.c --- qsort.c 1997/02/22 15:03:14 1.7 +++ qsort.c 1999/08/21 01:35:35 @@ -153,7 +153,7 @@ pb += es; pc -= es; } - if (swap_cnt == 0) { /* Switch to insertion sort */ + if (n = 32 swap_cnt == 0) { /* Switch to insertion sort */ for (pm = (char *)a + es; pm (char *)a + n * es; pm += es) for (pl = pm; pl (char *)a cmp(pl - es, pl) 0; pl -= es) -Archie ___ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
Doesn't the qsort just switch to isort *if* the partition to sort is short enough? Got to check it out, but afaik the size at that qsorts switch to isort is usually between 8 and 24, with 16 being most common - both halfs are shorter than 16, so they get isorted. Sander There is no love, no good, no happiness and no future - all these are just illusions. On Wed, 18 Aug 1999, Christopher Seiwald wrote: The FreeBSD qsort implementation has a rather nasty degenerate case. If you have data that partitions exactly about the median pivot, yet which is unsorted in either partition, you get get treated to an insertion sort. Example: 1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11 qsort picks 10 as the median, does no swaps on its first pass, and so decides to switch to an insertion sort. The idea is that if no swaps occur, the data is likely to be nearly already sorted and a good candidate for insertion sort. This turns out to be a (very) bad idea if you have some unsorted data buried in the middle of a (very) large dataset. The implementation claims to come from Bentley and McIlroy's paper, "Engineering a Sort Function," and this is largely true, but the insertion sort optimization(?) isn't in that paper. The GNU qsort.c only does an insertion sort if it is below a certain threshhold in size, and so never attempts such a sort on the whole dataset. (The GNU qsort does a number of other things pooh-poohed in the Bentley paper.) It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. Christopher Christopher Seiwald Perforce Software 1-510-864-7400 [EMAIL PROTECTED] f-f-f-fast SCM http://www.perforce.com To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: anybody love qsort.c?
Christopher Seiwald writes: The FreeBSD qsort implementation has a rather nasty degenerate case. If you have data that partitions exactly about the median pivot, yet which is unsorted in either partition, you get get treated to an insertion sort. Example: 1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11 qsort picks 10 as the median, does no swaps on its first pass, and so decides to switch to an insertion sort. The idea is that if no swaps occur, the data is likely to be nearly already sorted and a good candidate for insertion sort. This turns out to be a (very) bad idea if you have some unsorted data buried in the middle of a (very) large dataset. The implementation claims to come from Bentley and McIlroy's paper, "Engineering a Sort Function," and this is largely true, but the insertion sort optimization(?) isn't in that paper. The GNU qsort.c only does an insertion sort if it is below a certain threshhold in size, and so never attempts such a sort on the whole dataset. (The GNU qsort does a number of other things pooh-poohed in the Bentley paper.) It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. How about the code following sig. ? And the other codes and information on: http://www.mars.dti.ne.jp/~a-wada/qsortlib.html -- Akira Wada [EMAIL PROTECTED] - /* a-wada/qsortlib/ccn/qsorts.cMar. 30, 1999 */ /* Copyright (c) 1999 Akira Wada [EMAIL PROTECTED]*/ /* Quick sort compatible with "qsort (); in stdlib.h" */ /* efficiency by reducing the times of key comparisons and */ /* word_mode_swapping for word_aligned object*/ /* prevention from the degeneration caused by peculiar input */ /* This qsort is not stable, but could avoid the bad behavior by many*/ /* equal sort-keys. For stability, add a unique-key to compare-function. */ /* If the object to be sorted is an array of pointers to structs with*/ /* an unsigned integer key in ascending order, set the 3'rd argument */ /* to the key position in bytes and the 4'th to NULL, then */ /* radixsort would be applied. for example :*/ /* kps = 16; qsort (pv, n, kps, (ftp) 0); */ #include time.h #include stdlib.h void radixspi (void *, size_t, size_t); void srandom (unsigned long); /* if not in stdlib.h */ unsigned long random (); /* if not in stdlib.h */ #define MDT 16 /* threshold for median of the 5 or the more,MDT = 8 */ #define MDA 2 /* adjusting threshold slightly, MDT + MDA = 4 */ #define MDM 2 /* multiplier for threshold, MDM = 2 */ #define DGT256 /* threshold for checking degeneration*/ #define DGV 16 /* degeneration assumed if the smaller DGL */ #define DGM 0 /* threshold for selecting no median */ #define DGS 2 /* threshold for samples distributed uniformly*/ #define DGU 4 /* threshold for sampling at random */ #define DGL ((unsigned) psz / DGV) * rl #define defdgn int dgn = 0, nsp, itv #define ckdgnr(s, e) if (psz = DGT s + DGL e) dgn++ #define no_med (psz MDT DGM dgn dgn = DGS) #define symptm (psz MDT dgn DGS) #define xcsmpl() do { \ if (dgn = DGU) { /* samples distributed uniformly */\ nsp = 3; while (psz thr) thr *= MDM, nsp += 2; \ itv = ((unsigned) psz / (nsp - 1)) * rl; k = l, n = r; \ for (thr = MDT; psz thr; thr *= MDM) { \ i += rl, k += itv; swap (i, k);\ j -= rl, n -= itv; swap (n, j);}} \ else { /* samples at random */\ if (dgn == DGU + 1) dgn++, srandom (time (0)); \ for (thr = (unsigned) thr / MDM; psz thr; thr *= MDM) { \ rdmsmpl (i, i, j); i += rl;\ rdmsmpl (j, i, j); j -= rl;} \ rdmsmpl (m, i, j);}\ i = l, j = r, thr = MDT;} while (0) #define rdmsmpl(x, s, e) if (x != (n = s + rl * ( \ random () % ((unsigned) (e - (s)) / rl + 1 swap (x, n) #define swpwrk int t, w, z = sizeof (int), alg = ((int) av |
Re: anybody love qsort.c?
| How about the code following sig. ? | And the other codes and information on: | | http://www.mars.dti.ne.jp/~a-wada/qsortlib.html Unfortunately, I only use about 5% of my brain, and am incapable of convincing myself (or anyone else) of the correctness or efficiency of your qsort algorithm. In my quick attempt to educate myself, I did read your page and found it interesting. But as I'm proposing a change to a fairly sensitive piece of code, I'd like to keep the change as modest as possible. Christopher To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: anybody love qsort.c?
Doesn't the qsort just switch to isort *if* the partition to sort is short enough? Got to check it out, but afaik the size at that qsorts switch to isort is usually between 8 and 24, with 16 being most common - both halfs are shorter than 16, so they get isorted. Sander There is no love, no good, no happiness and no future - all these are just illusions. On Wed, 18 Aug 1999, Christopher Seiwald wrote: The FreeBSD qsort implementation has a rather nasty degenerate case. If you have data that partitions exactly about the median pivot, yet which is unsorted in either partition, you get get treated to an insertion sort. Example: 1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11 qsort picks 10 as the median, does no swaps on its first pass, and so decides to switch to an insertion sort. The idea is that if no swaps occur, the data is likely to be nearly already sorted and a good candidate for insertion sort. This turns out to be a (very) bad idea if you have some unsorted data buried in the middle of a (very) large dataset. The implementation claims to come from Bentley and McIlroy's paper, Engineering a Sort Function, and this is largely true, but the insertion sort optimization(?) isn't in that paper. The GNU qsort.c only does an insertion sort if it is below a certain threshhold in size, and so never attempts such a sort on the whole dataset. (The GNU qsort does a number of other things pooh-poohed in the Bentley paper.) It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. Christopher Christopher Seiwald Perforce Software 1-510-864-7400 seiw...@perforce.com f-f-f-fast SCM http://www.perforce.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
Narvi writes: Doesn't the qsort just switch to isort *if* the partition to sort is short enough? That's exactly Christopher's point. It should do this but it doesn't. The code is complex but from a quick glance it appears that the decision to switch to insertion sort does not depend on the total array length. -Archie ___ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
Answers to sundry comments: | why don't you implement this w/ the 5 element median selection qsort | algorithm? my professor for cis413 talked about this algorithm and | that it really is the fastest qsort algorithm... qsort algorithms are like stock market tips: only good in retrospect. The bentley scheme uses a 9 element median selection (80% better?). | Doesn't the qsort just switch to isort *if* the partition to sort is short | enough? Yes -- if it is 7 elements, it always does isort. But the problem is that it will switch to isort if it believes the data is mostly sorted, and it comes to this conclusion incorrectly. Unfortunately, I haven't found a sure-fire way to improve this. | The code is complex but from a quick glance it appears that the decision | to switch to insertion sort does not depend on the total array length. Right. Actually, once you get into it, the code is relatively simple. It is one of the features of the Bentley code. Christopher To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
Christopher Seiwald writes: The FreeBSD qsort implementation has a rather nasty degenerate case. If you have data that partitions exactly about the median pivot, yet which is unsorted in either partition, you get get treated to an insertion sort. Example: 1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11 qsort picks 10 as the median, does no swaps on its first pass, and so decides to switch to an insertion sort. The idea is that if no swaps occur, the data is likely to be nearly already sorted and a good candidate for insertion sort. This turns out to be a (very) bad idea if you have some unsorted data buried in the middle of a (very) large dataset. The implementation claims to come from Bentley and McIlroy's paper, Engineering a Sort Function, and this is largely true, but the insertion sort optimization(?) isn't in that paper. The GNU qsort.c only does an insertion sort if it is below a certain threshhold in size, and so never attempts such a sort on the whole dataset. (The GNU qsort does a number of other things pooh-poohed in the Bentley paper.) It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. How about the code following sig. ? And the other codes and information on: http://www.mars.dti.ne.jp/~a-wada/qsortlib.html -- Akira Wada a-w...@mars.dti.ne.jp - /* a-wada/qsortlib/ccn/qsorts.cMar. 30, 1999 */ /* Copyright (c) 1999 Akira Wada a-w...@mars.dti.ne.jp*/ /* Quick sort compatible with qsort (); in stdlib.h */ /* efficiency by reducing the times of key comparisons and */ /* word_mode_swapping for word_aligned object*/ /* prevention from the degeneration caused by peculiar input */ /* This qsort is not stable, but could avoid the bad behavior by many*/ /* equal sort-keys. For stability, add a unique-key to compare-function. */ /* If the object to be sorted is an array of pointers to structs with*/ /* an unsigned integer key in ascending order, set the 3'rd argument */ /* to the key position in bytes and the 4'th to NULL, then */ /* radixsort would be applied. for example :*/ /* kps = 16; qsort (pv, n, kps, (ftp) 0); */ #include time.h #include stdlib.h void radixspi (void *, size_t, size_t); void srandom (unsigned long); /* if not in stdlib.h */ unsigned long random (); /* if not in stdlib.h */ #define MDT 16 /* threshold for median of the 5 or the more,MDT = 8 */ #define MDA 2 /* adjusting threshold slightly, MDT + MDA = 4 */ #define MDM 2 /* multiplier for threshold, MDM = 2 */ #define DGT256 /* threshold for checking degeneration*/ #define DGV 16 /* degeneration assumed if the smaller DGL */ #define DGM 0 /* threshold for selecting no median */ #define DGS 2 /* threshold for samples distributed uniformly*/ #define DGU 4 /* threshold for sampling at random */ #define DGL ((unsigned) psz / DGV) * rl #define defdgn int dgn = 0, nsp, itv #define ckdgnr(s, e) if (psz = DGT s + DGL e) dgn++ #define no_med (psz MDT DGM dgn dgn = DGS) #define symptm (psz MDT dgn DGS) #define xcsmpl() do { \ if (dgn = DGU) { /* samples distributed uniformly */\ nsp = 3; while (psz thr) thr *= MDM, nsp += 2; \ itv = ((unsigned) psz / (nsp - 1)) * rl; k = l, n = r; \ for (thr = MDT; psz thr; thr *= MDM) { \ i += rl, k += itv; swap (i, k);\ j -= rl, n -= itv; swap (n, j);}} \ else { /* samples at random */\ if (dgn == DGU + 1) dgn++, srandom (time (0)); \ for (thr = (unsigned) thr / MDM; psz thr; thr *= MDM) { \ rdmsmpl (i, i, j); i += rl;\ rdmsmpl (j, i, j); j -= rl;} \ rdmsmpl (m, i, j);}\ i = l, j = r, thr = MDT;} while (0) #define rdmsmpl(x, s, e) if (x != (n = s + rl * ( \ random () % ((unsigned) (e - (s)) / rl + 1 swap (x, n) #define swpwrk int t, w, z = sizeof (int), alg = ((int) av |
Re: anybody love qsort.c?
| How about the code following sig. ? | And the other codes and information on: | | http://www.mars.dti.ne.jp/~a-wada/qsortlib.html Unfortunately, I only use about 5% of my brain, and am incapable of convincing myself (or anyone else) of the correctness or efficiency of your qsort algorithm. In my quick attempt to educate myself, I did read your page and found it interesting. But as I'm proposing a change to a fairly sensitive piece of code, I'd like to keep the change as modest as possible. Christopher To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
anybody love qsort.c?
The FreeBSD qsort implementation has a rather nasty degenerate case. If you have data that partitions exactly about the median pivot, yet which is unsorted in either partition, you get get treated to an insertion sort. Example: 1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11 qsort picks 10 as the median, does no swaps on its first pass, and so decides to switch to an insertion sort. The idea is that if no swaps occur, the data is likely to be nearly already sorted and a good candidate for insertion sort. This turns out to be a (very) bad idea if you have some unsorted data buried in the middle of a (very) large dataset. The implementation claims to come from Bentley and McIlroy's paper, "Engineering a Sort Function," and this is largely true, but the insertion sort optimization(?) isn't in that paper. The GNU qsort.c only does an insertion sort if it is below a certain threshhold in size, and so never attempts such a sort on the whole dataset. (The GNU qsort does a number of other things pooh-poohed in the Bentley paper.) It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. Christopher Christopher Seiwald Perforce Software 1-510-864-7400 [EMAIL PROTECTED] f-f-f-fast SCM http://www.perforce.com To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: anybody love qsort.c?
Christopher Seiwald writes: The FreeBSD qsort implementation has a rather nasty degenerate case. If you have data that partitions exactly about the median pivot, yet which is unsorted in either partition, you get get treated to an insertion sort. Example: 1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11 qsort picks 10 as the median, does no swaps on its first pass, and so decides to switch to an insertion sort. The idea is that if no swaps occur, the data is likely to be nearly already sorted and a good candidate for insertion sort. This turns out to be a (very) bad idea if you have some unsorted data buried in the middle of a (very) large dataset. The implementation claims to come from Bentley and McIlroy's paper, "Engineering a Sort Function," and this is largely true, but the insertion sort optimization(?) isn't in that paper. The GNU qsort.c only does an insertion sort if it is below a certain threshhold in size, and so never attempts such a sort on the whole dataset. (The GNU qsort does a number of other things pooh-poohed in the Bentley paper.) It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. Sounds good to me.. let's fix it. -Archie ___ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: anybody love qsort.c?
Christopher Seiwald scribbled this message on Aug 18: It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. why don't you implement this w/ the 5 element median selection qsort algorithm? my professor for cis413 talked about this algorithm and that it really is the fastest qsort algorithm... I don't have any pointers to a paper on this... but I might be able to dig some info up on it if you are interested... -- John-Mark Gurney Voice: +1 541 684 8449 Cu Networking P.O. Box 5693, 97405 "The soul contains in itself the event that shall presently befall it. The event is only the actualizing of its thought." -- Ralph Waldo Emerson To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
anybody love qsort.c?
The FreeBSD qsort implementation has a rather nasty degenerate case. If you have data that partitions exactly about the median pivot, yet which is unsorted in either partition, you get get treated to an insertion sort. Example: 1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11 qsort picks 10 as the median, does no swaps on its first pass, and so decides to switch to an insertion sort. The idea is that if no swaps occur, the data is likely to be nearly already sorted and a good candidate for insertion sort. This turns out to be a (very) bad idea if you have some unsorted data buried in the middle of a (very) large dataset. The implementation claims to come from Bentley and McIlroy's paper, Engineering a Sort Function, and this is largely true, but the insertion sort optimization(?) isn't in that paper. The GNU qsort.c only does an insertion sort if it is below a certain threshhold in size, and so never attempts such a sort on the whole dataset. (The GNU qsort does a number of other things pooh-poohed in the Bentley paper.) It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. Christopher Christopher Seiwald Perforce Software 1-510-864-7400 seiw...@perforce.com f-f-f-fast SCM http://www.perforce.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
Christopher Seiwald writes: The FreeBSD qsort implementation has a rather nasty degenerate case. If you have data that partitions exactly about the median pivot, yet which is unsorted in either partition, you get get treated to an insertion sort. Example: 1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11 qsort picks 10 as the median, does no swaps on its first pass, and so decides to switch to an insertion sort. The idea is that if no swaps occur, the data is likely to be nearly already sorted and a good candidate for insertion sort. This turns out to be a (very) bad idea if you have some unsorted data buried in the middle of a (very) large dataset. The implementation claims to come from Bentley and McIlroy's paper, Engineering a Sort Function, and this is largely true, but the insertion sort optimization(?) isn't in that paper. The GNU qsort.c only does an insertion sort if it is below a certain threshhold in size, and so never attempts such a sort on the whole dataset. (The GNU qsort does a number of other things pooh-poohed in the Bentley paper.) It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. Sounds good to me.. let's fix it. -Archie ___ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: anybody love qsort.c?
Christopher Seiwald scribbled this message on Aug 18: It's a pretty straightforward change to bypass the insertion sort for large subsets of the data. If no one has a strong love for qsort, I'll educate myself on how to make and contribute this change. why don't you implement this w/ the 5 element median selection qsort algorithm? my professor for cis413 talked about this algorithm and that it really is the fastest qsort algorithm... I don't have any pointers to a paper on this... but I might be able to dig some info up on it if you are interested... -- John-Mark Gurney Voice: +1 541 684 8449 Cu Networking P.O. Box 5693, 97405 The soul contains in itself the event that shall presently befall it. The event is only the actualizing of its thought. -- Ralph Waldo Emerson To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message