RE: New portion of improvements for Dual-Pivot Quicksort

Dmytro Sheyko Wed, 19 May 2010 02:04:42 -0700

Vladimir,

I can see that you changed sortNegZeroAndNaN(float[]...) but probably forgot to 
change sortNegZeroAndNaN(double[]...).


You really puzzled me with failed testcase and note that sorting algorithm 
(without special attention to zeros) generally may change number of negative 
zeros.
I will provide my comments later.

As for counting sort, I think we should use single format style over the file 
(unless we have valuable reason not to do this). I mean to choose
1)
        if (toIndex - fromIndex > COUNTING_SORT_THRESHOLD_FOR_SHORT_OR_CHAR) {
            countingSort(a, fromIndex, toIndex);
            return;
        }
        sort(a, fromIndex, toIndex - 1, true);
2)
        if (toIndex - fromIndex > COUNTING_SORT_THRESHOLD_FOR_SHORT_OR_CHAR) {

            countingSort(a, fromIndex, toIndex);

        } else {
            sort(a, fromIndex, toIndex - 1, true);


        }
I prefer the second one.

Thanks a lot,
Dmytro Sheyko



> Date: Tue, 18 May 2010 18:57:50 +0400
> From: [email protected]
> Subject: Re: New portion of improvements for Dual-Pivot Quicksort
> To: [email protected]
> CC: [email protected]
> 
> Hello,
> 
> I've run your modification for counting sort, it real faster.
> I attached new version with your changes (I did little bit
> format it) and included my case with float/double.
> 
> Note that you modification doesn't pass test from Sorting class,
> which I sent earlier. It fails on float/double test:
> 
> Test #3: random = 666, len = 34, a = 0, g = 6, z = 9, n = 10, p = 9
> 
> I suggest shorter method (which is based on your idea to skip counting
> negative zeros on Phase 1.): I found find first zero index (or it will
> be index of first positive element if no zeros at all, or last negative,
> if no positive and zero elements) and then swap negative zero to the
> beginning of the sub-range.
> 
> int hi = right;
> 
> while (left < hi) {
>      int middle = (left + hi) >>> 1;
>      float middleValue = a[middle];
> 
>      if (middleValue < 0.0f) {
>          left = middle + 1;
>      } else {
>          hi = middle;
>      }
> }
> 
> for (int k = left, p = left; k <= right; k++) {
>      float ak = a[k];
>      if (ak != 0.0f) {
>          return;
>      }
>      if (Float.floatToRawIntBits(ak) < 0) { // ak is -0.0f
>          a[k] = +0.0f;
>          a[p++] = -0.0f;
>      }
> }
> 
> Important note: in partitioning loop there are several places
> (marked by // !) where potential bug with -0.0 could be
> (when pivot and a[great] are zeros with different signs):
> 
>      if (a[great] == pivot1) {
>          a[k] = a[less];
> -        a[less++] = pivot1; // !
> +        a[less++] = a[great];
>      } else { // pivot1 < a[great] < pivot2
>          a[k] = a[great];
>      }
> -    a[great--] = pivot2; // !
> +    a[great--] = ak;
> } else if (ak == pivot1) { // Move a[k] to left part
>      a[k] = a[less];
> -    a[less++] = pivot1; // !
> +    a[less++] = ak;
> }
> 
> and the same in "Pivots are equal" branch.
> 
> I did changes "pivot1/2 -> ak" in methods for all types
> and "pivot1 -> a[great]" in float/double sections only.
> 
> Please, review format changes for counting sort and new version
> of Phase 3 for float/double.
> 
> Thank you,
> Vladimir
> 
> Dmytro Sheyko wrote:
> > Hi,
> > 
> > About counting sort again.
> > 
> > 1. This condition "i < count.length && k <= right" is excessive. Any one 
> > conjunct is enough. "k <= right" seems better.
> > 2. No need to calculate "short value = (short) (i + Short.MIN_VALUE)" 
> > when "count[i]" is zero.
> > 3. For signed primitives (byte and short) we would better loop backward. 
> > Thanks to "k >= fromIndex" condition we will quit looping earlier 
> > assuming that typically we work with positive numbers.
> > For unsigned primitives (char) we would better loop forward because 
> > typically we work with characters about zero (ASCII).
> > 
> > -        for (int i = 0, k = left; i < count.length && k <= right; i++) {
> > -            short value = (short) (i + Short.MIN_VALUE);
> > -            for (int s = count[i]; s > 0; s--) {
> > -                a[k++] = value;
> > -            }
> > -        }
> > 
> > +        for (int i = NUM_SHORT_VALUES - 1, k = toIndex - 1; k >= 
> > fromIndex; --i) {
> > +            while (count[i] == 0) --i;
> > +            short value = (short) (i + Short.MIN_VALUE);
> > +            int s = count[i];
> > +            do { a[k--] = value; } while (--s > 0);
> > +        }
> > 
> > Thanks,
> > Dmytro Sheyko
> > 
> >  > From: [email protected]
> >  > To: [email protected]
> >  > CC: [email protected]; [email protected]
> >  > Subject: Re[2]: New portion of improvements for Dual-Pivot Quicksort
> >  > Date: Tue, 18 May 2010 01:11:19 +0400
> >  >
> >  > Sounds good!
> >  > Will consider too...
> >  >
> >  > Mon, 17 May 2010 22:24:11 +0700 письмо от Dmytro Sheyko 
> > <[email protected]>:
> >  >
> >  > > Hi,
> >  > >
> >  > > Regarding counting sort. We can check whether we should switch to 
> > counting sort only once in the beginning.
> >  > >
> >  > > > Date: Mon, 17 May 2010 17:30:37 +0400
> >  > > > From: [email protected]
> >  > > > Subject: Re: New portion of improvements for Dual-Pivot Quicksort
> >  > > > To: [email protected]
> >  > > > CC: [email protected]
> >  > > >
> >  > > > Hello,
> >  > > >
> >  > > > Thank you for review, I'll check and run tests again with you 
> > changes.
> >  > > >
> >  > > > Thank you,
> >  > > > Vladimir
> >  > > >
> >  > > > Dmytro Sheyko wrote:
> >  > > > > Hello,
> >  > > > >
> >  > > > > More ideas.
> >  > > > >
> >  > > > > 1. We can use
> >  > > > > Double.doubleToRawLongBits instead of Double.doubleToLongBits and
> >  > > > > Float.floatToRawIntBits instead of Float.floatToIntBits.
> >  > > > > No need to handle NaN's because they all are placed to the end 
> > of array.
> >  > > > >
> >  > > > > 2. Note that
> >  > > > > Double.doubleToRawLongBits(+0.0) == 0L and
> >  > > > > Double.doubleToRawLongBits(-0.0) == Long.MIN_VALUE and
> >  > > > > Float.floatToRawIntBits(+0.0) == 0 and
> >  > > > > Float.floatToRawIntBits(-0.0) == Integer.MIN_VALUE.
> >  > > > >
> >  > > > > Comparing with is zero usually more efficient (or at least not 
> > worse)
> >  > > > > than with other values. Thus such pattern
> >  > > > >
> >  > > > > if (ak == 0.0f && NEGATIVE_ZERO == Float.floatToIntBits(ak))
> >  > > > >
> >  > > > > can be replaced with
> >  > > > >
> >  > > > > if (ak == 0.0f && Float.floatToIntBits(ak) < 0)
> >  > > > >
> >  > > > > 3. It would be more efficient to count negative zeros after 
> > sorting.
> >  > > > > General sorting algorithm puts both negative and positive zeros 
> > together
> >  > > > > (but maybe not in right order).
> >  > > > > Therefore we have to process less elements because usually we 
> > have less
> >  > > > > zeros than other numbers.
> >  > > > >
> >  > > > > Thanks,
> >  > > > > Dmytro Sheyko
> >  > > > >
> >  > > > > > From: [email protected]
> >  > > > > > To: [email protected]; [email protected]
> >  > > > > > CC: [email protected]; [email protected]
> >  > > > > > Subject: Re[6]: New portion of improvements for Dual-Pivot 
> > Quicksort
> >  > > > > > Date: Fri, 14 May 2010 23:54:06 +0400
> >  > > > > >
> >  > > > > > Hello,
> >  > > > > >
> >  > > > > > I've updated the class, please, review the changes.
> >  > > > > >
> >  > > > > > Vladimir
> >  > > > > >
> >  > > > > > Fri, 14 May 2010 01:48:11 +0700 письмо от Dmytro Sheyko
> >  > > > > <[email protected]>:
> >  > > > > >
> >  > > > > > > Yes. I prefer F (Find First zero using binary search) over 
> > C (Count
> >  > > > > negatives) and S (Smart Scan for zero).
> >  > > > > > >
> >  > > > > > > > From: [email protected]
> >  > > > > > > > To: [email protected]
> >  > > > > > > > CC: [email protected]; [email protected];
> >  > > > > [email protected]
> >  > > > > > > > Subject: Re[4]: New portion of improvements for 
> > Dual-Pivot Quicksort
> >  > > > > > > > Date: Thu, 13 May 2010 21:34:54 +0400
> >  > > > > > > >
> >  > > > > > > > Dmytro,
> >  > > > > > > >
> >  > > > > > > > I've tested your suggested variants, and found that case "C"
> >  > > > > > > > (very interesting approach to find first position of zero
> >  > > > > > > > by counting negative elements) works slower than original
> >  > > > > > > > or two other cases.
> >  > > > > > > >
> >  > > > > > > > Implementations "F" and "S" are very close to each other
> >  > > > > > > > and little bit faster than original. I prefer case "F":
> >  > > > > > > > it is shorter and more clear. Do you agree?
> >  > > > > > > >
> >  > > > > > > > I'll prepare updated DualPivotQuicksort file and send it
> >  > > > > > > > tomorrow.
> >  > > > > > > >
> >  > > > > > > > Thank you,
> >  > > > > > > > Vladimir
> >  > > > > > > >
> >  > > > > > > > Wed, 12 May 2010 17:04:52 +0700 письмо от Dmytro Sheyko
> >  > > > > <[email protected]>:
> >  > > > > > > >
> >  > > > > > > > > Vladimir,
> >  > > > > > > > >
> >  > > > > > > > > Your changes are good for me.
> >  > > > > > > > >
> >  > > > > > > > > Additionally I have some comments/proposals regarding 
> > dealing
> >  > > > > with negative zeros.
> >  > > > > > > > >
> >  > > > > > > > > 1. Scanning for the first zero we can avoid range check 
> > (i >=
> >  > > > > left) if we have at least one negative value.
> >  > > > > > > > > --- DualPivotQuicksort.java Tue May 11 09:04:19 2010
> >  > > > > > > > > +++ DualPivotQuicksortS.java Wed May 12 12:10:46 2010
> >  > > > > > > > > @@ -1705,10 +1705,15 @@
> >  > > > > > > > > }
> >  > > > > > > > >
> >  > > > > > > > > // Find first zero element
> >  > > > > > > > > - int zeroIndex = findAnyZero(a, left, n);
> >  > > > > > > > > + int zeroIndex = 0;
> >  > > > > > > > >
> >  > > > > > > > > - for (int i = zeroIndex - 1; i >= left && a[i] == 
> > 0.0f; i--) {
> >  > > > > > > > > - zeroIndex = i;
> >  > > > > > > > > + if (a[left] < 0.0f) {
> >  > > > > > > > > + zeroIndex = findAnyZero(a, left, n);
> >  > > > > > > > > +
> >  > > > > > > > > + // there is at least one negative value, so range 
> > check is
> >  > > > > not needed
> >  > > > > > > > > + for (int i = zeroIndex - 1; /*i >= left &&*/ a[i] == 
> > 0.0f; i--) {
> >  > > > > > > > > + zeroIndex = i;
> >  > > > > > > > > + }
> >  > > > > > > > > }
> >  > > > > > > > >
> >  > > > > > > > > // Turn the right number of positive zeros back into 
> > negative zeros
> >  > > > > > > > >
> >  > > > > > > > > 2. We can find the position of the first zero by counting
> >  > > > > negative values during preprocessing phase.
> >  > > > > > > > > --- DualPivotQuicksort.java Tue May 11 09:04:19 2010
> >  > > > > > > > > +++ DualPivotQuicksortC.java Wed May 12 12:01:24 2010
> >  > > > > > > > > @@ -1678,7 +1678,7 @@
> >  > > > > > > > > * Phase 1: Count negative zeros and move NaNs to end of 
> > array.
> >  > > > > > > > > */
> >  > > > > > > > > final int NEGATIVE_ZERO = Float.floatToIntBits(-0.0f);
> >  > > > > > > > > - int numNegativeZeros = 0;
> >  > > > > > > > > + int numNegativeZeros = 0, numNegativeValues = 0;
> >  > > > > > > > > int n = right;
> >  > > > > > > > >
> >  > > > > > > > > for (int k = left; k <= n; k++) {
> >  > > > > > > > > @@ -1689,6 +1689,8 @@
> >  > > > > > > > > } else if (ak != ak) { // i.e., ak is NaN
> >  > > > > > > > > a[k--] = a[n];
> >  > > > > > > > > a[n--] = Float.NaN;
> >  > > > > > > > > + } else if (ak < 0.0f) {
> >  > > > > > > > > + numNegativeValues++;
> >  > > > > > > > > }
> >  > > > > > > > > }
> >  > > > > > > > >
> >  > > > > > > > > @@ -1705,7 +1707,7 @@
> >  > > > > > > > > }
> >  > > > > > > > >
> >  > > > > > > > > // Find first zero element
> >  > > > > > > > > - int zeroIndex = findAnyZero(a, left, n);
> >  > > > > > > > > + int zeroIndex = numNegativeValues;
> >  > > > > > > > >
> >  > > > > > > > > for (int i = zeroIndex - 1; i >= left && a[i] == 0.0f; 
> > i--) {
> >  > > > > > > > > zeroIndex = i;
> >  > > > > > > > >
> >  > > > > > > > > 3. We can use binary search to find the first zero and 
> > thus
> >  > > > > avoid linear scan.
> >  > > > > > > > > --- DualPivotQuicksort.java Tue May 11 09:04:19 2010
> >  > > > > > > > > +++ DualPivotQuicksortF.java Wed May 12 12:03:58 2010
> >  > > > > > > > > @@ -1705,11 +1705,7 @@
> >  > > > > > > > > }
> >  > > > > > > > >
> >  > > > > > > > > // Find first zero element
> >  > > > > > > > > - int zeroIndex = findAnyZero(a, left, n);
> >  > > > > > > > > -
> >  > > > > > > > > - for (int i = zeroIndex - 1; i >= left && a[i] == 
> > 0.0f; i--) {
> >  > > > > > > > > - zeroIndex = i;
> >  > > > > > > > > - }
> >  > > > > > > > > + int zeroIndex = findFirstZero(a, left, n);
> >  > > > > > > > >
> >  > > > > > > > > // Turn the right number of positive zeros back into 
> > negative zeros
> >  > > > > > > > > for (int i = zeroIndex, m = zeroIndex + 
> > numNegativeZeros; i <
> >  > > > > m; i++) {
> >  > > > > > > > > @@ -1718,7 +1714,7 @@
> >  > > > > > > > > }
> >  > > > > > > > >
> >  > > > > > > > > /**
> >  > > > > > > > > - * Returns the index of some zero element in the 
> > specified
> >  > > > > range via
> >  > > > > > > > > + * Returns the index of the first zero element in the
> >  > > > > specified range via
> >  > > > > > > > > * binary search. The range is assumed to be sorted, and 
> > must
> >  > > > > contain
> >  > > > > > > > > * at least one zero.
> >  > > > > > > > > *
> >  > > > > > > > > @@ -1726,18 +1722,17 @@
> >  > > > > > > > > * @param low the index of the first element, inclusive, 
> > to be
> >  > > > > searched
> >  > > > > > > > > * @param high the index of the last element, inclusive, 
> > to be
> >  > > > > searched
> >  > > > > > > > > */
> >  > > > > > > > > - private static int findAnyZero(float[] a, int low, 
> > int high) {
> >  > > > > > > > > - while (true) {
> >  > > > > > > > > + private static int findFirstZero(float[] a, int low, 
> > int high) {
> >  > > > > > > > > + while (low < high) {
> >  > > > > > > > > int middle = (low + high) >>> 1;
> >  > > > > > > > > float middleValue = a[middle];
> >  > > > > > > > >
> >  > > > > > > > > if (middleValue < 0.0f) {
> >  > > > > > > > > low = middle + 1;
> >  > > > > > > > > - } else if (middleValue > 0.0f) {
> >  > > > > > > > > - high = middle - 1;
> >  > > > > > > > > - } else { // middleValue == 0.0f
> >  > > > > > > > > - return middle;
> >  > > > > > > > > + } else { // middleValue >= 0.0f
> >  > > > > > > > > + high = middle;
> >  > > > > > > > > }
> >  > > > > > > > > + return low;
> >  > > > > > > > > }
> >  > > > > > > > > }
> >  > > > > > > > >
> >  > > > > > > > > Counting negative values appeared more expensive than 
> > any other
> >  > > > > variants.
> >  > > > > > > > > The last proposal seems to me as efficient as the current
> >  > > > > solution is in its worst case - when we have only one negative 
> > zero (in
> >  > > > > the half of array).
> >  > > > > > > > > And it shows the best result if we have many zeros.
> >  > > > > > > > >
> >  > > > > > > > > Regards,
> >  > > > > > > > > Dmytro Sheyko
> >  > > > > > > > >
> >  > > > > > > > > > From: [email protected]
> >  > > > > > > > > > To: [email protected]; [email protected]
> >  > > > > > > > > > CC: [email protected]; [email protected]
> >  > > > > > > > > > Subject: Re[2]: New portion of improvements for 
> > Dual-Pivot
> >  > > > > Quicksort
> >  > > > > > > > > > Date: Sun, 9 May 2010 23:51:27 +0400
> >  > > > > > > > > >
> >  > > > > > > > > > Josh,
> >  > > > > > > > > > Dmytro,
> >  > > > > > > > > >
> >  > > > > > > > > > I have done more thoroughly testing "great - less > 5 *
> >  > > > > seventh" vs. "less < e1 && great > e5",
> >  > > > > > > > > > and found that more symmetric code "less < e1 && 
> > great > e5"
> >  > > > > is little bit faster, ~0.5..0.7%
> >  > > > > > > > > > on both VMs. Other code has not been changed.
> >  > > > > > > > > >
> >  > > > > > > > > > Please, take the latest version in attachment.
> >  > > > > > > > > >
> >  > > > > > > > > > Vladimir
> >  > > > > > > > > >
> >  > > > > > > > > > Tue, 4 May 2010 21:57:42 -0700 письмо от Joshua Bloch
> >  > > > > <[email protected]>:
> >  > > > > > > > > >
> >  > > > > > > > > > > Vladimir,
> >  > > > > > > > > > >
> >  > > > > > > > > > > Old:
> >  > > > > > > > > > >
> >  > > > > > > > > > >298 if (less < e1 && great > e5) {
> >  > > > > > > > > > >
> >  > > > > > > > > > > New:
> >  > > > > > > > > > >
> >  > > > > > > > > > >256 if (great - less > 5 * seventh) {
> >  > > > > > > > > >
> >  > > > > > > > > > >Regards,
> >  > > > > > > > > > >Josh
                                          
_________________________________________________________________
Hotmail: Trusted email with powerful SPAM protection.
https://signup.live.com/signup.aspx?id=60969

RE: New portion of improvements for Dual-Pivot Quicksort

Reply via email to