Re: anybody love qsort.c?

1999-08-26 Thread Christopher Seiwald

Apparently, someone does love qsort.

Akira Wada wrote:

| I think it would be difficult, not impossible, to decide the threshold
| (Christopher Seiwald said, "# swaps = 1024", and Tim Vanderhoek suggested,
| "a ratio: #comparisons : # swaps") and to what point of qsort, to make
| isort bail back, effectively, without troubles, and covering every situations.

The value 1024 is arbitrary, to be sure, but it is in a very safe range:
at 0, the BSD addition of isort to Bentley's qsort is left intact; at
infinity, it is entirely disabled.  Any number in between will exhibit
at worst the behavior of one of the two.

The number selection doesn't have to be fancy: as Akira pointed out,
my problem seems to be a fringe case that no one else has ever noted.
I am therefore more or less selecting the # for me and my data.

| Then an improvement would be :
|   (1) delete the switching to "isort by swap_cnt",
|  and add the routine detecting sorted completely.

Actually, if the data is sorted then the isort does just as many
comparisons (and the same zero swaps) as a sort detection routine.

|   (2) adding (1), make the partitioning logic symmetrical,
|  for dataset sorted in reversed order to be processed efficiently,
|  e.g. malloc(size) for pivot instead of "swap(a, pm)"
|  and some modifications required by this.

This starts to get like real code.  My intent is not to reengineer qsort,
or at all challenge Bentley's algorithm or BSD's addition, but rather
temper the latter with a safe, understandable (for me) alteration.

By the way, thanks (Akira) for your very nice depiction of the
problem of partially sorted data.

Christopher


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: anybody love qsort.c?

1999-08-26 Thread Christopher Seiwald
Apparently, someone does love qsort.

Akira Wada wrote:

| I think it would be difficult, not impossible, to decide the threshold
| (Christopher Seiwald said, # swaps = 1024, and Tim Vanderhoek suggested,
| a ratio: #comparisons : # swaps) and to what point of qsort, to make
| isort bail back, effectively, without troubles, and covering every situations.

The value 1024 is arbitrary, to be sure, but it is in a very safe range:
at 0, the BSD addition of isort to Bentley's qsort is left intact; at
infinity, it is entirely disabled.  Any number in between will exhibit
at worst the behavior of one of the two.

The number selection doesn't have to be fancy: as Akira pointed out,
my problem seems to be a fringe case that no one else has ever noted.
I am therefore more or less selecting the # for me and my data.

| Then an improvement would be :
|   (1) delete the switching to isort by swap_cnt,
|  and add the routine detecting sorted completely.

Actually, if the data is sorted then the isort does just as many
comparisons (and the same zero swaps) as a sort detection routine.

|   (2) adding (1), make the partitioning logic symmetrical,
|  for dataset sorted in reversed order to be processed efficiently,
|  e.g. malloc(size) for pivot instead of swap(a, pm)
|  and some modifications required by this.

This starts to get like real code.  My intent is not to reengineer qsort,
or at all challenge Bentley's algorithm or BSD's addition, but rather
temper the latter with a safe, understandable (for me) alteration.

By the way, thanks (Akira) for your very nice depiction of the
problem of partially sorted data.

Christopher


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-25 Thread Akira Wada


Christopher Seiwald wrote...
 Archie's mod to qsort:
 
 |  - if (swap_cnt == 0) {  /* Switch to insertion sort */
 |  + if (n = 32  swap_cnt == 0) {  /* Switch to insertion sort */
 
 As Akira Wada points out, this eliminates the benefit of the optimization
 in the first place, which is to let isort take over if the data is largely
 sorted in the first place.
 
 Akira Wada's alternative mod:
 
 | +   pl = (char *)a; pn = (char *)a + (n - 1) * es;
 | +   while (pl  pn  cmp(pl, pl + es) = 0) pl += es;
 | +   if (pl = pn) return;
 
 This is a little more comprehensive, but does throw in an extra pass
 of comparisons, which (I'm sure) someone would complain about.
 
 The alteration that I've tried and tested is to have the isort bail
 back to qsort if it does more than N swaps.  I put N at 1024, which
 worked for my data :).This is almost guaranteed to do no more work
 than the current logic, because it it only gives up on the isort when
 it has evidence that the isort is doing too much work.
 
I think it would be difficult, not impossible, to decide the threshold
(Christopher Seiwald said, "# swaps = 1024", and Tim Vanderhoek suggested,
"a ratio: #comparisons : # swaps") and to what point of qsort, to make
isort bail back, effectively, without troubles, and covering every situations.

I considered the cases (swap_cnt == 0) is true, and they could be
illustrated by figure [1],[2],[3] below.

[1] |:   ( 2 subarrays are randomized )
|:   This is the worst case pointed out
|:   by Christopher Seiwald, and so
|:   isort has to work twice for random
  k |:   arrays of n/2 elements, although
  e +*   probability of this case might be
  y |:   very very small in real life.
|:   Because of this low probability, 
|:   I guess, FreeBSD qsort had been left
|:   as it is, for about 10 years since the
|:   1'st issue, even though someone wuold
+-+---   have been already aware of the thing.
position

[2] |:   *  (sorted almost)
|: *This is the case isort is expected to
| (X):   *  work efficiently, but if any elements
|: *exist in region (X) and/or (Y),
  k |:  "swap_cnt" could not detect the advantages.
  e +*  And probability (X) and (Y) are empty,
  y |:  might be very small, at least, in the
|  * :  practical applications. This case might
|*   :(Y)   occur when a lower significant key is 
|  * :  added for the dataset sorted already
|*   :  with some higher keys (same keys exsist),
++  etc..
  position

[3] |: *   (sorted completely)
|:   * Therefor, usefulness of "swap_cnt" is
|: *   guaranteed only for this case.
|:   * And so, it would be worth while to detect
  k |: *   the array sorted completely, without risk
  e +* to degenerate, instead of "swap_cnt and isort", 
  y |  * : and the test loop is not so expensive for 
|*   : usual arrangements because of breaking
|  * : out from loop at the 1'st pair out of order,
|   *: and at worst n - 2 comparisons.
| *  :
++
  position

Then an improvement would be :
  (1) delete the switching to "isort by swap_cnt",
 and add the routine detecting sorted completely.
  (2) adding (1), make the partitioning logic symmetrical,
 for dataset sorted in reversed order to be processed efficiently,
 e.g. malloc(size) for pivot instead of "swap(a, pm)"
 and some modifications required by this.

By the way, I am doubting yet the efficiency of isort for small subarrays
in quicksort, so I'd made some experiments for this and the results at ;
   http://www.mars.dti.ne.jp/~a-wada/qsortlib/ccn/qsortins.c
   http://www.mars.dti.ne.jp/~a-wada/qsortlib/ccn/qsortins.txt
so that I didn't use isort in my qsorts.c at my 1'st post,
and in the other codes for practical usages, at ;
   http://www.mars.dti.ne.jp/~a-wada/qsortlib.html
--
Akira Wada



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: anybody love qsort.c?

1999-08-25 Thread Akira Wada

Christopher Seiwald wrote...
 Archie's mod to qsort:
 
 |  - if (swap_cnt == 0) {  /* Switch to insertion sort */
 |  + if (n = 32  swap_cnt == 0) {  /* Switch to insertion sort */
 
 As Akira Wada points out, this eliminates the benefit of the optimization
 in the first place, which is to let isort take over if the data is largely
 sorted in the first place.
 
 Akira Wada's alternative mod:
 
 | +   pl = (char *)a; pn = (char *)a + (n - 1) * es;
 | +   while (pl  pn  cmp(pl, pl + es) = 0) pl += es;
 | +   if (pl = pn) return;
 
 This is a little more comprehensive, but does throw in an extra pass
 of comparisons, which (I'm sure) someone would complain about.
 
 The alteration that I've tried and tested is to have the isort bail
 back to qsort if it does more than N swaps.  I put N at 1024, which
 worked for my data :).This is almost guaranteed to do no more work
 than the current logic, because it it only gives up on the isort when
 it has evidence that the isort is doing too much work.
 
I think it would be difficult, not impossible, to decide the threshold
(Christopher Seiwald said, # swaps = 1024, and Tim Vanderhoek suggested,
a ratio: #comparisons : # swaps) and to what point of qsort, to make
isort bail back, effectively, without troubles, and covering every situations.

I considered the cases (swap_cnt == 0) is true, and they could be
illustrated by figure [1],[2],[3] below.

[1] |:   ( 2 subarrays are randomized )
|:   This is the worst case pointed out
|:   by Christopher Seiwald, and so
|:   isort has to work twice for random
  k |:   arrays of n/2 elements, although
  e +*   probability of this case might be
  y |:   very very small in real life.
|:   Because of this low probability, 
|:   I guess, FreeBSD qsort had been left
|:   as it is, for about 10 years since the
|:   1'st issue, even though someone wuold
+-+---   have been already aware of the thing.
position

[2] |:   *  (sorted almost)
|: *This is the case isort is expected to
| (X):   *  work efficiently, but if any elements
|: *exist in region (X) and/or (Y),
  k |:  swap_cnt could not detect the advantages.
  e +*  And probability (X) and (Y) are empty,
  y |:  might be very small, at least, in the
|  * :  practical applications. This case might
|*   :(Y)   occur when a lower significant key is 
|  * :  added for the dataset sorted already
|*   :  with some higher keys (same keys exsist),
++  etc..
  position

[3] |: *   (sorted completely)
|:   * Therefor, usefulness of swap_cnt is
|: *   guaranteed only for this case.
|:   * And so, it would be worth while to detect
  k |: *   the array sorted completely, without risk
  e +* to degenerate, instead of swap_cnt and isort, 
  y |  * : and the test loop is not so expensive for 
|*   : usual arrangements because of breaking
|  * : out from loop at the 1'st pair out of order,
|   *: and at worst n - 2 comparisons.
| *  :
++
  position

Then an improvement would be :
  (1) delete the switching to isort by swap_cnt,
 and add the routine detecting sorted completely.
  (2) adding (1), make the partitioning logic symmetrical,
 for dataset sorted in reversed order to be processed efficiently,
 e.g. malloc(size) for pivot instead of swap(a, pm)
 and some modifications required by this.

By the way, I am doubting yet the efficiency of isort for small subarrays
in quicksort, so I'd made some experiments for this and the results at ;
   http://www.mars.dti.ne.jp/~a-wada/qsortlib/ccn/qsortins.c
   http://www.mars.dti.ne.jp/~a-wada/qsortlib/ccn/qsortins.txt
so that I didn't use isort in my qsorts.c at my 1'st post,
and in the other codes for practical usages, at ;
   http://www.mars.dti.ne.jp/~a-wada/qsortlib.html
--
Akira Wada



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-23 Thread Ville-Pertti Keinonen


[EMAIL PROTECTED] (John-Mark Gurney) writes:

 Christopher Seiwald scribbled this message on Aug 18:
  It's a pretty straightforward change to bypass the insertion sort for
  large subsets of the data.  If no one has a strong love for qsort, I'll
  educate myself on how to make and contribute this change.
 
 why don't you implement this w/ the 5 element median selection qsort
 algorithm?  my professor for cis413 talked about this algorithm and
 that it really is the fastest qsort algorithm...   I don't have any
 pointers to a paper on this... but I might be able to dig some info up
 on it if you are interested...

I don't think the point is eliminating worst-cases, but optimizing
common cases, which in this case caused more worst-cases and thus
needs fixing.  Besides, the median selection chooses among more than 3
elements already (but only when the data set is large enough).

For fixing worst cases, an introspective sort might be a good idea,
i.e. do a quick sort but fall back to heap sort if a certain depth is
exceeded (you know you're losing when the depth exceeds log n).  This
also has another advantage - if you limit the depth of the sort, you
don't need to use the cpu stack for state, you can allocate a
fixed-size array for the purpose.  This probably isn't a real
performance advantage for a C qsort implementation because of the
overhead of calling cmp.  It does, however, guarantee that sorting
uses a reasonable amount of stack.  Such an assumption isn't portable
when using qsort(3), though.  Expect to die if you do large qsorts
from threads with small thread stacks.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: anybody love qsort.c?

1999-08-23 Thread Tim Vanderhoek

On Mon, Aug 23, 1999 at 12:28:32AM -0700, Christopher Seiwald wrote:
 
 The alteration that I've tried and tested is to have the isort bail
 back to qsort if it does more than N swaps.  I put N at 1024, which

Perhaps a ratio:  #comparisons : # swaps

If the ratio gets too high, then bail.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: anybody love qsort.c?

1999-08-23 Thread Christopher Seiwald
Archie's mod to qsort:

|  -   if (swap_cnt == 0) {  /* Switch to insertion sort */
|  +   if (n = 32  swap_cnt == 0) {  /* Switch to insertion sort */

As Akira Wada points out, this eliminates the benefit of the optimization
in the first place, which is to let isort take over if the data is largely
sorted in the first place.

Akira Wada's alternative mod:

| +   pl = (char *)a; pn = (char *)a + (n - 1) * es;
| +   while (pl  pn  cmp(pl, pl + es) = 0) pl += es;
| +   if (pl = pn) return;

This is a little more comprehensive, but does throw in an extra pass
of comparisons, which (I'm sure) someone would complain about.

The alteration that I've tried and tested is to have the isort bail
back to qsort if it does more than N swaps.  I put N at 1024, which
worked for my data :).This is almost guaranteed to do no more work
than the current logic, because it it only gives up on the isort when
it has evidence that the isort is doing too much work.

Christopher


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-23 Thread Ville-Pertti Keinonen

gurne...@efn.org (John-Mark Gurney) writes:

 Christopher Seiwald scribbled this message on Aug 18:
  It's a pretty straightforward change to bypass the insertion sort for
  large subsets of the data.  If no one has a strong love for qsort, I'll
  educate myself on how to make and contribute this change.
 
 why don't you implement this w/ the 5 element median selection qsort
 algorithm?  my professor for cis413 talked about this algorithm and
 that it really is the fastest qsort algorithm...   I don't have any
 pointers to a paper on this... but I might be able to dig some info up
 on it if you are interested...

I don't think the point is eliminating worst-cases, but optimizing
common cases, which in this case caused more worst-cases and thus
needs fixing.  Besides, the median selection chooses among more than 3
elements already (but only when the data set is large enough).

For fixing worst cases, an introspective sort might be a good idea,
i.e. do a quick sort but fall back to heap sort if a certain depth is
exceeded (you know you're losing when the depth exceeds log n).  This
also has another advantage - if you limit the depth of the sort, you
don't need to use the cpu stack for state, you can allocate a
fixed-size array for the purpose.  This probably isn't a real
performance advantage for a C qsort implementation because of the
overhead of calling cmp.  It does, however, guarantee that sorting
uses a reasonable amount of stack.  Such an assumption isn't portable
when using qsort(3), though.  Expect to die if you do large qsorts
from threads with small thread stacks.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-23 Thread Tim Vanderhoek
On Mon, Aug 23, 1999 at 12:28:32AM -0700, Christopher Seiwald wrote:
 
 The alteration that I've tried and tested is to have the isort bail
 back to qsort if it does more than N swaps.  I put N at 1024, which

Perhaps a ratio:  #comparisons : # swaps

If the ratio gets too high, then bail.


-- 
This is my .signature which gets appended to the end of my messages.


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-22 Thread Akira Wada
Following my previous post:
I wrote ..
I believe a reversed dataset would be partitioned
into two subpartitions sorted in order at the 1'st pass of 
the partitionings. Is this incorrect ?

Sorry, I'd confirmed BSD qsort's partitioning logic does not
guarantee that a reversed dataset would be partitioned
into two subpartitions sorted in order at the 1'st pass
of the partitionings.

-Akira Wada


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-21 Thread Nick Hibma


You can check the change by recompiling a few utils with the change:
(find . -name \*.c | xargs grep -l qsort)

./bin/ps/ps.c
./contrib/gcc/*.c
./contrib/top/commands.c
./games/fortune/strfile/strfile.c
./gnu/usr.bin/sort/sort.c
./sbin/fsck/pass2.c

The fsck one is a nice one. Just wack your /usr and reboot :-)

Nick



On Fri, 20 Aug 1999, Archie Cobbs wrote:

 Christopher Seiwald writes:
  But as I'm proposing a change to a fairly sensitive piece of code, I'd
  like to keep the change as modest as possible.
 
 How about this?
 
 Index: qsort.c
 ===
 RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v
 retrieving revision 1.7
 diff -u -r1.7 qsort.c
 --- qsort.c   1997/02/22 15:03:14 1.7
 +++ qsort.c   1999/08/21 01:35:35
 @@ -153,7 +153,7 @@
   pb += es;
   pc -= es;
   }
 - if (swap_cnt == 0) {  /* Switch to insertion sort */
 + if (n = 32  swap_cnt == 0) {  /* Switch to insertion sort */
   for (pm = (char *)a + es; pm  (char *)a + n * es; pm += es)
   for (pl = pm; pl  (char *)a  cmp(pl - es, pl)  0;
pl -= es)
 
 
 -Archie
 
 ___
 Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message
 
 

-- 
e-Mail: [EMAIL PROTECTED]



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: anybody love qsort.c?

1999-08-21 Thread Akira Wada


Archie Cobbs wrote...
 Christopher Seiwald writes:
  But as I'm proposing a change to a fairly sensitive piece of code, I'd
  like to keep the change as modest as possible.
 
 How about this?
 
 Index: qsort.c
 ===
 RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v
 retrieving revision 1.7
 diff -u -r1.7 qsort.c
 --- qsort.c   1997/02/22 15:03:14 1.7
 +++ qsort.c   1999/08/21 01:35:35
 @@ -153,7 +153,7 @@
   pb += es;
   pc -= es;
   }
 - if (swap_cnt == 0) {  /* Switch to insertion sort */
 + if (n = 32  swap_cnt == 0) {  /* Switch to insertion sort */
   for (pm = (char *)a + es; pm  (char *)a + n * es; pm += es)
   for (pl = pm; pl  (char *)a  cmp(pl - es, pl)  0;
pl -= es)
 
 
 -Archie
 
 ___
 Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com
 

I think your modification would avoid the degeneration indicated 
by Christopher Seiwald, but degrade the advantage for the dataset  
sorted completely or sorted in reversed order, down to nearly
equal for random dataset.
I added a routine before selecting pivot to test current partition
sorted already and if so, to bypass partitioning. It works well
for dataset sorted in order, but doesn't work for dataset in
reversed order. I believe a reversed dataset would be partitioned
into two subpartitions sorted in order at the 1'st pass of 
the partitionings. Is this incorrect ?

--
for qsort.c,v 1.9 1998/06/30 11:05:11
@@ -102,2 +102,5 @@
swap_cnt = 0;
+   pl = (char *)a; pn = (char *)a + (n - 1) * es;
+   while (pl  pn  cmp(pl, pl + es) = 0) pl += es;
+   if (pl = pn) return;
if (n  7) {

--

-Akira Wada


*
$BOBED!!IK(B   $B!?(B Akira Wada 
$BEl5~ETJ!@8;TIpB"LnBf(B 1-27-5 $B%k%MJ!@8(B B609
   (tel,fax) 042-552-1143
   E-mail : [EMAIL PROTECTED]
*


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: anybody love qsort.c?

1999-08-21 Thread Akira Wada

Following my previous post:
I wrote ..
I believe a reversed dataset would be partitioned
into two subpartitions sorted in order at the 1'st pass of 
the partitionings. Is this incorrect ?

Sorry, I'd confirmed BSD qsort's partitioning logic does not
guarantee that "a reversed dataset would be partitioned
into two subpartitions sorted in order at the 1'st pass
of the partitionings".

-Akira Wada


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: anybody love qsort.c?

1999-08-21 Thread Nick Hibma

You can check the change by recompiling a few utils with the change:
(find . -name \*.c | xargs grep -l qsort)

./bin/ps/ps.c
./contrib/gcc/*.c
./contrib/top/commands.c
./games/fortune/strfile/strfile.c
./gnu/usr.bin/sort/sort.c
./sbin/fsck/pass2.c

The fsck one is a nice one. Just wack your /usr and reboot :-)

Nick



On Fri, 20 Aug 1999, Archie Cobbs wrote:

 Christopher Seiwald writes:
  But as I'm proposing a change to a fairly sensitive piece of code, I'd
  like to keep the change as modest as possible.
 
 How about this?
 
 Index: qsort.c
 ===
 RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v
 retrieving revision 1.7
 diff -u -r1.7 qsort.c
 --- qsort.c   1997/02/22 15:03:14 1.7
 +++ qsort.c   1999/08/21 01:35:35
 @@ -153,7 +153,7 @@
   pb += es;
   pc -= es;
   }
 - if (swap_cnt == 0) {  /* Switch to insertion sort */
 + if (n = 32  swap_cnt == 0) {  /* Switch to insertion sort */
   for (pm = (char *)a + es; pm  (char *)a + n * es; pm += es)
   for (pl = pm; pl  (char *)a  cmp(pl - es, pl)  0;
pl -= es)
 
 
 -Archie
 
 ___
 Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com
 
 
 To Unsubscribe: send mail to majord...@freebsd.org
 with unsubscribe freebsd-hackers in the body of the message
 
 

-- 
e-Mail: hi...@skylink.it



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-21 Thread Akira Wada

Archie Cobbs wrote...
 Christopher Seiwald writes:
  But as I'm proposing a change to a fairly sensitive piece of code, I'd
  like to keep the change as modest as possible.
 
 How about this?
 
 Index: qsort.c
 ===
 RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v
 retrieving revision 1.7
 diff -u -r1.7 qsort.c
 --- qsort.c   1997/02/22 15:03:14 1.7
 +++ qsort.c   1999/08/21 01:35:35
 @@ -153,7 +153,7 @@
   pb += es;
   pc -= es;
   }
 - if (swap_cnt == 0) {  /* Switch to insertion sort */
 + if (n = 32  swap_cnt == 0) {  /* Switch to insertion sort */
   for (pm = (char *)a + es; pm  (char *)a + n * es; pm += es)
   for (pl = pm; pl  (char *)a  cmp(pl - es, pl)  0;
pl -= es)
 
 
 -Archie

I think your modification would avoid the degeneration indicated 
by Christopher Seiwald, but degrade the advantage for the dataset  
sorted completely or sorted in reversed order, down to nearly
equal for random dataset.
I added a routine before selecting pivot to test current partition
sorted already and if so, to bypass partitioning. It works well
for dataset sorted in order, but doesn't work for dataset in
reversed order. I believe a reversed dataset would be partitioned
into two subpartitions sorted in order at the 1'st pass of 
the partitionigs. Is this incorrect ?

--
for qsort.c,v 1.9 1998/06/30 11:05:11
@@ -102,2 +102,5 @@
swap_cnt = 0;
+   pl = (char *)a; pn = (char *)a + (n - 1) * es;
+   while (pl  pn  cmp(pl, pl + es) pl += es;
+   if (pl = pn) return;
if (n  7) {

--

-Akira Wada



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-21 Thread Akira Wada

Archie Cobbs wrote...
 Christopher Seiwald writes:
  But as I'm proposing a change to a fairly sensitive piece of code, I'd
  like to keep the change as modest as possible.
 
 How about this?
 
 Index: qsort.c
 ===
 RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v
 retrieving revision 1.7
 diff -u -r1.7 qsort.c
 --- qsort.c   1997/02/22 15:03:14 1.7
 +++ qsort.c   1999/08/21 01:35:35
 @@ -153,7 +153,7 @@
   pb += es;
   pc -= es;
   }
 - if (swap_cnt == 0) {  /* Switch to insertion sort */
 + if (n = 32  swap_cnt == 0) {  /* Switch to insertion sort */
   for (pm = (char *)a + es; pm  (char *)a + n * es; pm += es)
   for (pl = pm; pl  (char *)a  cmp(pl - es, pl)  0;
pl -= es)
 
 
 -Archie
 
 ___
 Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com
 

I think your modification would avoid the degeneration indicated 
by Christopher Seiwald, but degrade the advantage for the dataset  
sorted completely or sorted in reversed order, down to nearly
equal for random dataset.
I added a routine before selecting pivot to test current partition
sorted already and if so, to bypass partitioning. It works well
for dataset sorted in order, but doesn't work for dataset in
reversed order. I believe a reversed dataset would be partitioned
into two subpartitions sorted in order at the 1'st pass of 
the partitionings. Is this incorrect ?

--
for qsort.c,v 1.9 1998/06/30 11:05:11
@@ -102,2 +102,5 @@
swap_cnt = 0;
+   pl = (char *)a; pn = (char *)a + (n - 1) * es;
+   while (pl  pn  cmp(pl, pl + es) = 0) pl += es;
+   if (pl = pn) return;
if (n  7) {

--

-Akira Wada


*
和田 彬   / Akira Wada 
東京都福生市武蔵野台 1-27-5 ルネ福生 B609
   (tel,fax) 042-552-1143
   E-mail : a-w...@mars.dti.ne.jp
*


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-20 Thread Archie Cobbs

Christopher Seiwald writes:
 But as I'm proposing a change to a fairly sensitive piece of code, I'd
 like to keep the change as modest as possible.

How about this?

Index: qsort.c
===
RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v
retrieving revision 1.7
diff -u -r1.7 qsort.c
--- qsort.c 1997/02/22 15:03:14 1.7
+++ qsort.c 1999/08/21 01:35:35
@@ -153,7 +153,7 @@
pb += es;
pc -= es;
}
-   if (swap_cnt == 0) {  /* Switch to insertion sort */
+   if (n = 32  swap_cnt == 0) {  /* Switch to insertion sort */
for (pm = (char *)a + es; pm  (char *)a + n * es; pm += es)
for (pl = pm; pl  (char *)a  cmp(pl - es, pl)  0;
 pl -= es)


-Archie

___
Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: anybody love qsort.c?

1999-08-20 Thread Archie Cobbs
Christopher Seiwald writes:
 But as I'm proposing a change to a fairly sensitive piece of code, I'd
 like to keep the change as modest as possible.

How about this?

Index: qsort.c
===
RCS file: /home/ncvs/src/lib/libc/stdlib/qsort.c,v
retrieving revision 1.7
diff -u -r1.7 qsort.c
--- qsort.c 1997/02/22 15:03:14 1.7
+++ qsort.c 1999/08/21 01:35:35
@@ -153,7 +153,7 @@
pb += es;
pc -= es;
}
-   if (swap_cnt == 0) {  /* Switch to insertion sort */
+   if (n = 32  swap_cnt == 0) {  /* Switch to insertion sort */
for (pm = (char *)a + es; pm  (char *)a + n * es; pm += es)
for (pl = pm; pl  (char *)a  cmp(pl - es, pl)  0;
 pl -= es)


-Archie

___
Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-19 Thread Narvi


Doesn't the qsort just switch to isort *if* the partition to sort is short
enough? 

Got to check it out, but afaik the size at that qsorts switch to isort is
usually between 8 and 24, with 16 being most common - both halfs are
shorter than 16, so they get isorted.

Sander

There is no love, no good, no happiness and no future -
all these are just illusions.

On Wed, 18 Aug 1999, Christopher Seiwald wrote:

 The FreeBSD qsort implementation has a rather nasty degenerate case.
 If you have data that partitions exactly about the median pivot, yet
 which is unsorted in either partition, you get get treated to an insertion
 sort.  Example:
 
   1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11
 
 qsort picks 10 as the median, does no swaps on its first pass, and so
 decides to switch to an insertion sort.   The idea is that if no swaps
 occur, the data is likely to be nearly already sorted and a good candidate
 for insertion sort.  This turns out to be a (very) bad idea if you have
 some unsorted data buried in the middle of a (very) large dataset.
 
 The implementation claims to come from Bentley and McIlroy's paper,
 "Engineering a Sort Function," and this is largely true, but the insertion
 sort optimization(?) isn't in that paper.  The GNU qsort.c only does an
 insertion sort if it is below a certain threshhold in size, and so never
 attempts such a sort on the whole dataset.  (The GNU qsort does a number
 of other things pooh-poohed in the Bentley paper.)
 
 It's a pretty straightforward change to bypass the insertion sort for
 large subsets of the data.  If no one has a strong love for qsort, I'll
 educate myself on how to make and contribute this change.
 
 Christopher
 
 Christopher Seiwald Perforce Software  1-510-864-7400
 [EMAIL PROTECTED] f-f-f-fast SCM   http://www.perforce.com
 
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with "unsubscribe freebsd-hackers" in the body of the message
 



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: anybody love qsort.c?

1999-08-19 Thread Akira Wada


Christopher Seiwald writes:
 The FreeBSD qsort implementation has a rather nasty degenerate case.
 If you have data that partitions exactly about the median pivot, yet
 which is unsorted in either partition, you get get treated to an insertion
 sort.  Example:

   1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11

 qsort picks 10 as the median, does no swaps on its first pass, and so
 decides to switch to an insertion sort.   The idea is that if no swaps
 occur, the data is likely to be nearly already sorted and a good candidate
 for insertion sort.  This turns out to be a (very) bad idea if you have
 some unsorted data buried in the middle of a (very) large dataset.

 The implementation claims to come from Bentley and McIlroy's paper,
 "Engineering a Sort Function," and this is largely true, but the insertion
 sort optimization(?) isn't in that paper.  The GNU qsort.c only does an
 insertion sort if it is below a certain threshhold in size, and so never
 attempts such a sort on the whole dataset.  (The GNU qsort does a number
 of other things pooh-poohed in the Bentley paper.)

 It's a pretty straightforward change to bypass the insertion sort for
 large subsets of the data.  If no one has a strong love for qsort, I'll
 educate myself on how to make and contribute this change.

How about the code following sig. ?
And the other codes and information on:

  http://www.mars.dti.ne.jp/~a-wada/qsortlib.html

--
Akira Wada [EMAIL PROTECTED]
-
/*  a-wada/qsortlib/ccn/qsorts.cMar. 30, 1999 */
/*  Copyright (c) 1999  Akira Wada [EMAIL PROTECTED]*/

/*  Quick sort compatible with "qsort (); in stdlib.h"  */
/*  efficiency by reducing the times of key comparisons and   */
/*  word_mode_swapping for word_aligned object*/
/*  prevention from the degeneration caused by peculiar input */
/*  This qsort is not stable, but could avoid the bad behavior by many*/
/*  equal sort-keys. For stability, add a unique-key to compare-function. */

/*  If the object to be sorted is an array of pointers to structs with*/
/*  an unsigned integer key in ascending order, set the 3'rd argument */
/*  to the key position in bytes and the 4'th to NULL, then   */
/*  radixsort would be applied.  for example :*/
/*   kps = 16; qsort (pv, n, kps, (ftp) 0);   */

#include time.h
#include stdlib.h
void radixspi (void *, size_t, size_t);
void srandom (unsigned long); /* if not in stdlib.h */
unsigned long random ();  /* if not in stdlib.h */

#define MDT 16  /* threshold for median of the 5 or the more,MDT = 8 */
#define MDA  2  /* adjusting threshold slightly,   MDT + MDA = 4 */
#define MDM  2  /* multiplier for threshold, MDM = 2 */

#define DGT256  /* threshold for checking degeneration*/
#define DGV 16  /* degeneration assumed if the smaller  DGL  */
#define DGM  0  /* threshold for selecting no median  */
#define DGS  2  /* threshold for samples distributed uniformly*/
#define DGU  4  /* threshold for sampling at random   */
#define DGL ((unsigned) psz / DGV) * rl
#define defdgn  int dgn = 0, nsp, itv
#define ckdgnr(s, e) if (psz = DGT  s + DGL  e) dgn++
#define no_med  (psz  MDT  DGM  dgn  dgn = DGS)
#define symptm  (psz  MDT  dgn  DGS)
#define xcsmpl() do {  \
if (dgn = DGU) {   /* samples distributed uniformly */\
nsp = 3; while (psz  thr) thr *= MDM, nsp += 2;   \
itv = ((unsigned) psz / (nsp - 1)) * rl; k = l, n = r; \
for (thr = MDT; psz  thr; thr *= MDM) {   \
i += rl, k += itv; swap (i, k);\
j -= rl, n -= itv; swap (n, j);}}  \
else {  /* samples at random */\
if (dgn == DGU + 1) dgn++, srandom (time (0)); \
for (thr = (unsigned) thr / MDM; psz  thr; thr *= MDM) {  \
rdmsmpl (i, i, j); i += rl;\
rdmsmpl (j, i, j); j -= rl;}   \
rdmsmpl (m, i, j);}\
i = l, j = r, thr = MDT;} while (0)
#define rdmsmpl(x, s, e) if (x != (n = s + rl * (  \
random () % ((unsigned) (e - (s)) / rl + 1 swap (x, n)

#define swpwrk int t, w, z = sizeof (int), alg = ((int) av | 

Re: anybody love qsort.c?

1999-08-19 Thread Christopher Seiwald

| How about the code following sig. ?
| And the other codes and information on:
|
|   http://www.mars.dti.ne.jp/~a-wada/qsortlib.html

Unfortunately, I only use about 5% of my brain, and am incapable of
convincing myself (or anyone else) of the correctness or efficiency
of your qsort algorithm.

In my quick attempt to educate myself, I did read your page and found
it interesting.

But as I'm proposing a change to a fairly sensitive piece of code, I'd
like to keep the change as modest as possible.

Christopher


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: anybody love qsort.c?

1999-08-19 Thread Narvi

Doesn't the qsort just switch to isort *if* the partition to sort is short
enough? 

Got to check it out, but afaik the size at that qsorts switch to isort is
usually between 8 and 24, with 16 being most common - both halfs are
shorter than 16, so they get isorted.

Sander

There is no love, no good, no happiness and no future -
all these are just illusions.

On Wed, 18 Aug 1999, Christopher Seiwald wrote:

 The FreeBSD qsort implementation has a rather nasty degenerate case.
 If you have data that partitions exactly about the median pivot, yet
 which is unsorted in either partition, you get get treated to an insertion
 sort.  Example:
 
   1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11
 
 qsort picks 10 as the median, does no swaps on its first pass, and so
 decides to switch to an insertion sort.   The idea is that if no swaps
 occur, the data is likely to be nearly already sorted and a good candidate
 for insertion sort.  This turns out to be a (very) bad idea if you have
 some unsorted data buried in the middle of a (very) large dataset.
 
 The implementation claims to come from Bentley and McIlroy's paper,
 Engineering a Sort Function, and this is largely true, but the insertion
 sort optimization(?) isn't in that paper.  The GNU qsort.c only does an
 insertion sort if it is below a certain threshhold in size, and so never
 attempts such a sort on the whole dataset.  (The GNU qsort does a number
 of other things pooh-poohed in the Bentley paper.)
 
 It's a pretty straightforward change to bypass the insertion sort for
 large subsets of the data.  If no one has a strong love for qsort, I'll
 educate myself on how to make and contribute this change.
 
 Christopher
 
 Christopher Seiwald Perforce Software  1-510-864-7400
 seiw...@perforce.com f-f-f-fast SCM   http://www.perforce.com
 
 
 
 To Unsubscribe: send mail to majord...@freebsd.org
 with unsubscribe freebsd-hackers in the body of the message
 



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-19 Thread Archie Cobbs
Narvi writes:
 Doesn't the qsort just switch to isort *if* the partition to sort is short
 enough? 

That's exactly Christopher's point. It should do this but it doesn't.
The code is complex but from a quick glance it appears that the decision
to switch to insertion sort does not depend on the total array length.

-Archie

___
Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-19 Thread Christopher Seiwald
Answers to sundry comments:

| why don't you implement this w/ the 5 element median selection qsort
| algorithm?  my professor for cis413 talked about this algorithm and
| that it really is the fastest qsort algorithm...   

qsort algorithms are like stock market tips: only good in retrospect.
The bentley scheme uses a 9 element median selection (80% better?).

| Doesn't the qsort just switch to isort *if* the partition to sort is short
| enough? 

Yes -- if it is  7 elements, it always does isort.  But the problem is
that it will switch to isort if it believes the data is mostly sorted, and
it comes to this conclusion incorrectly.  Unfortunately, I haven't found
a sure-fire way to improve this.

| The code is complex but from a quick glance it appears that the decision
| to switch to insertion sort does not depend on the total array length.

Right.  Actually, once you get into it, the code is relatively simple.
It is one of the features of the Bentley code.

Christopher


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-19 Thread Akira Wada

Christopher Seiwald writes:
 The FreeBSD qsort implementation has a rather nasty degenerate case.
 If you have data that partitions exactly about the median pivot, yet
 which is unsorted in either partition, you get get treated to an insertion
 sort.  Example:

   1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11

 qsort picks 10 as the median, does no swaps on its first pass, and so
 decides to switch to an insertion sort.   The idea is that if no swaps
 occur, the data is likely to be nearly already sorted and a good candidate
 for insertion sort.  This turns out to be a (very) bad idea if you have
 some unsorted data buried in the middle of a (very) large dataset.

 The implementation claims to come from Bentley and McIlroy's paper,
 Engineering a Sort Function, and this is largely true, but the insertion
 sort optimization(?) isn't in that paper.  The GNU qsort.c only does an
 insertion sort if it is below a certain threshhold in size, and so never
 attempts such a sort on the whole dataset.  (The GNU qsort does a number
 of other things pooh-poohed in the Bentley paper.)

 It's a pretty straightforward change to bypass the insertion sort for
 large subsets of the data.  If no one has a strong love for qsort, I'll
 educate myself on how to make and contribute this change.

How about the code following sig. ?
And the other codes and information on:

  http://www.mars.dti.ne.jp/~a-wada/qsortlib.html

--
Akira Wada a-w...@mars.dti.ne.jp
-
/*  a-wada/qsortlib/ccn/qsorts.cMar. 30, 1999 */
/*  Copyright (c) 1999  Akira Wada a-w...@mars.dti.ne.jp*/

/*  Quick sort compatible with qsort (); in stdlib.h  */
/*  efficiency by reducing the times of key comparisons and   */
/*  word_mode_swapping for word_aligned object*/
/*  prevention from the degeneration caused by peculiar input */
/*  This qsort is not stable, but could avoid the bad behavior by many*/
/*  equal sort-keys. For stability, add a unique-key to compare-function. */

/*  If the object to be sorted is an array of pointers to structs with*/
/*  an unsigned integer key in ascending order, set the 3'rd argument */
/*  to the key position in bytes and the 4'th to NULL, then   */
/*  radixsort would be applied.  for example :*/
/*   kps = 16; qsort (pv, n, kps, (ftp) 0);   */

#include time.h
#include stdlib.h
void radixspi (void *, size_t, size_t);
void srandom (unsigned long); /* if not in stdlib.h */
unsigned long random ();  /* if not in stdlib.h */

#define MDT 16  /* threshold for median of the 5 or the more,MDT = 8 */
#define MDA  2  /* adjusting threshold slightly,   MDT + MDA = 4 */
#define MDM  2  /* multiplier for threshold, MDM = 2 */

#define DGT256  /* threshold for checking degeneration*/
#define DGV 16  /* degeneration assumed if the smaller  DGL  */
#define DGM  0  /* threshold for selecting no median  */
#define DGS  2  /* threshold for samples distributed uniformly*/
#define DGU  4  /* threshold for sampling at random   */
#define DGL ((unsigned) psz / DGV) * rl
#define defdgn  int dgn = 0, nsp, itv
#define ckdgnr(s, e) if (psz = DGT  s + DGL  e) dgn++
#define no_med  (psz  MDT  DGM  dgn  dgn = DGS)
#define symptm  (psz  MDT  dgn  DGS)
#define xcsmpl() do {  \
if (dgn = DGU) {   /* samples distributed uniformly */\
nsp = 3; while (psz  thr) thr *= MDM, nsp += 2;   \
itv = ((unsigned) psz / (nsp - 1)) * rl; k = l, n = r; \
for (thr = MDT; psz  thr; thr *= MDM) {   \
i += rl, k += itv; swap (i, k);\
j -= rl, n -= itv; swap (n, j);}}  \
else {  /* samples at random */\
if (dgn == DGU + 1) dgn++, srandom (time (0)); \
for (thr = (unsigned) thr / MDM; psz  thr; thr *= MDM) {  \
rdmsmpl (i, i, j); i += rl;\
rdmsmpl (j, i, j); j -= rl;}   \
rdmsmpl (m, i, j);}\
i = l, j = r, thr = MDT;} while (0)
#define rdmsmpl(x, s, e) if (x != (n = s + rl * (  \
random () % ((unsigned) (e - (s)) / rl + 1 swap (x, n)

#define swpwrk int t, w, z = sizeof (int), alg = ((int) av | 

Re: anybody love qsort.c?

1999-08-19 Thread Christopher Seiwald
| How about the code following sig. ?
| And the other codes and information on:
|
|   http://www.mars.dti.ne.jp/~a-wada/qsortlib.html

Unfortunately, I only use about 5% of my brain, and am incapable of
convincing myself (or anyone else) of the correctness or efficiency
of your qsort algorithm.

In my quick attempt to educate myself, I did read your page and found
it interesting.

But as I'm proposing a change to a fairly sensitive piece of code, I'd
like to keep the change as modest as possible.

Christopher


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



anybody love qsort.c?

1999-08-18 Thread Christopher Seiwald

The FreeBSD qsort implementation has a rather nasty degenerate case.
If you have data that partitions exactly about the median pivot, yet
which is unsorted in either partition, you get get treated to an insertion
sort.  Example:

1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11

qsort picks 10 as the median, does no swaps on its first pass, and so
decides to switch to an insertion sort.   The idea is that if no swaps
occur, the data is likely to be nearly already sorted and a good candidate
for insertion sort.  This turns out to be a (very) bad idea if you have
some unsorted data buried in the middle of a (very) large dataset.

The implementation claims to come from Bentley and McIlroy's paper,
"Engineering a Sort Function," and this is largely true, but the insertion
sort optimization(?) isn't in that paper.  The GNU qsort.c only does an
insertion sort if it is below a certain threshhold in size, and so never
attempts such a sort on the whole dataset.  (The GNU qsort does a number
of other things pooh-poohed in the Bentley paper.)

It's a pretty straightforward change to bypass the insertion sort for
large subsets of the data.  If no one has a strong love for qsort, I'll
educate myself on how to make and contribute this change.

Christopher

Christopher Seiwald Perforce Software  1-510-864-7400
[EMAIL PROTECTED] f-f-f-fast SCM   http://www.perforce.com



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: anybody love qsort.c?

1999-08-18 Thread Archie Cobbs

Christopher Seiwald writes:
 The FreeBSD qsort implementation has a rather nasty degenerate case.
 If you have data that partitions exactly about the median pivot, yet
 which is unsorted in either partition, you get get treated to an insertion
 sort.  Example:
 
   1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11
 
 qsort picks 10 as the median, does no swaps on its first pass, and so
 decides to switch to an insertion sort.   The idea is that if no swaps
 occur, the data is likely to be nearly already sorted and a good candidate
 for insertion sort.  This turns out to be a (very) bad idea if you have
 some unsorted data buried in the middle of a (very) large dataset.
 
 The implementation claims to come from Bentley and McIlroy's paper,
 "Engineering a Sort Function," and this is largely true, but the insertion
 sort optimization(?) isn't in that paper.  The GNU qsort.c only does an
 insertion sort if it is below a certain threshhold in size, and so never
 attempts such a sort on the whole dataset.  (The GNU qsort does a number
 of other things pooh-poohed in the Bentley paper.)
 
 It's a pretty straightforward change to bypass the insertion sort for
 large subsets of the data.  If no one has a strong love for qsort, I'll
 educate myself on how to make and contribute this change.

Sounds good to me.. let's fix it.

-Archie

___
Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: anybody love qsort.c?

1999-08-18 Thread John-Mark Gurney

Christopher Seiwald scribbled this message on Aug 18:
 It's a pretty straightforward change to bypass the insertion sort for
 large subsets of the data.  If no one has a strong love for qsort, I'll
 educate myself on how to make and contribute this change.

why don't you implement this w/ the 5 element median selection qsort
algorithm?  my professor for cis413 talked about this algorithm and
that it really is the fastest qsort algorithm...   I don't have any
pointers to a paper on this... but I might be able to dig some info up
on it if you are interested...

-- 
  John-Mark Gurney  Voice: +1 541 684 8449
  Cu Networking   P.O. Box 5693, 97405

  "The soul contains in itself the event that shall presently befall it.
  The event is only the actualizing of its thought." -- Ralph Waldo Emerson


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



anybody love qsort.c?

1999-08-18 Thread Christopher Seiwald
The FreeBSD qsort implementation has a rather nasty degenerate case.
If you have data that partitions exactly about the median pivot, yet
which is unsorted in either partition, you get get treated to an insertion
sort.  Example:

1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11

qsort picks 10 as the median, does no swaps on its first pass, and so
decides to switch to an insertion sort.   The idea is that if no swaps
occur, the data is likely to be nearly already sorted and a good candidate
for insertion sort.  This turns out to be a (very) bad idea if you have
some unsorted data buried in the middle of a (very) large dataset.

The implementation claims to come from Bentley and McIlroy's paper,
Engineering a Sort Function, and this is largely true, but the insertion
sort optimization(?) isn't in that paper.  The GNU qsort.c only does an
insertion sort if it is below a certain threshhold in size, and so never
attempts such a sort on the whole dataset.  (The GNU qsort does a number
of other things pooh-poohed in the Bentley paper.)

It's a pretty straightforward change to bypass the insertion sort for
large subsets of the data.  If no one has a strong love for qsort, I'll
educate myself on how to make and contribute this change.

Christopher

Christopher Seiwald Perforce Software  1-510-864-7400
seiw...@perforce.com f-f-f-fast SCM   http://www.perforce.com



To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-18 Thread Archie Cobbs
Christopher Seiwald writes:
 The FreeBSD qsort implementation has a rather nasty degenerate case.
 If you have data that partitions exactly about the median pivot, yet
 which is unsorted in either partition, you get get treated to an insertion
 sort.  Example:
 
   1 2 3 4 5 6 7 8 9 10 19 18 17 16 14 14 13 12 11
 
 qsort picks 10 as the median, does no swaps on its first pass, and so
 decides to switch to an insertion sort.   The idea is that if no swaps
 occur, the data is likely to be nearly already sorted and a good candidate
 for insertion sort.  This turns out to be a (very) bad idea if you have
 some unsorted data buried in the middle of a (very) large dataset.
 
 The implementation claims to come from Bentley and McIlroy's paper,
 Engineering a Sort Function, and this is largely true, but the insertion
 sort optimization(?) isn't in that paper.  The GNU qsort.c only does an
 insertion sort if it is below a certain threshhold in size, and so never
 attempts such a sort on the whole dataset.  (The GNU qsort does a number
 of other things pooh-poohed in the Bentley paper.)
 
 It's a pretty straightforward change to bypass the insertion sort for
 large subsets of the data.  If no one has a strong love for qsort, I'll
 educate myself on how to make and contribute this change.

Sounds good to me.. let's fix it.

-Archie

___
Archie Cobbs   *   Whistle Communications, Inc.  *   http://www.whistle.com


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message



Re: anybody love qsort.c?

1999-08-18 Thread John-Mark Gurney
Christopher Seiwald scribbled this message on Aug 18:
 It's a pretty straightforward change to bypass the insertion sort for
 large subsets of the data.  If no one has a strong love for qsort, I'll
 educate myself on how to make and contribute this change.

why don't you implement this w/ the 5 element median selection qsort
algorithm?  my professor for cis413 talked about this algorithm and
that it really is the fastest qsort algorithm...   I don't have any
pointers to a paper on this... but I might be able to dig some info up
on it if you are interested...

-- 
  John-Mark Gurney  Voice: +1 541 684 8449
  Cu Networking   P.O. Box 5693, 97405

  The soul contains in itself the event that shall presently befall it.
  The event is only the actualizing of its thought. -- Ralph Waldo Emerson


To Unsubscribe: send mail to majord...@freebsd.org
with unsubscribe freebsd-hackers in the body of the message