This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 1304974  [SPARK-26955][CORE] Align Spark's TimSort to jdk11 
implementation
1304974 is described below

commit 13049745391f97601ebc1402e77210638d908e40
Author: Maxim Gekk <maxim.g...@databricks.com>
AuthorDate: Thu Feb 21 22:18:23 2019 -0600

    [SPARK-26955][CORE] Align Spark's TimSort to jdk11 implementation
    
    ## What changes were proposed in this pull request?
    
    Spark's TimSort deviates from JDK 11 TimSort in a couple places:
    - `stackLen` was increased in jdk
    - additional cases for break in `mergeCollapse`: `n < 0`
    
    In the PR, I propose to align Spark TimSort to jdk implementation.
    
    ## How was this patch tested?
    
    By existing test suites, in particular, `SorterSuite`.
    
    Closes #23858 from MaxGekk/timsort-java-alignment.
    
    Authored-by: Maxim Gekk <maxim.g...@databricks.com>
    Signed-off-by: Sean Owen <sean.o...@databricks.com>
---
 .../java/org/apache/spark/util/collection/TimSort.java  | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/core/src/main/java/org/apache/spark/util/collection/TimSort.java 
b/core/src/main/java/org/apache/spark/util/collection/TimSort.java
index 40b5fb7..3142866 100644
--- a/core/src/main/java/org/apache/spark/util/collection/TimSort.java
+++ b/core/src/main/java/org/apache/spark/util/collection/TimSort.java
@@ -409,10 +409,14 @@ class TimSort<K, Buffer> {
        * large) stack lengths for smaller arrays.  The "magic numbers" in the
        * computation below must be changed if MIN_MERGE is decreased.  See
        * the MIN_MERGE declaration above for more information.
+       * The maximum value of 49 allows for an array up to length
+       * Integer.MAX_VALUE-4, if array is filled by the worst case stack size
+       * increasing scenario. More explanations are given in section 4 of:
+       * http://envisage-project.eu/wp-content/uploads/2015/02/sorting.pdf
        */
       int stackLen = (len <    120  ?  5 :
                       len <   1542  ? 10 :
-                      len < 119151  ? 19 : 40);
+                      len < 119151  ? 24 : 49);
       runBase = new int[stackLen];
       runLen = new int[stackLen];
     }
@@ -439,15 +443,20 @@ class TimSort<K, Buffer> {
      * This method is called each time a new run is pushed onto the stack,
      * so the invariants are guaranteed to hold for i < stackSize upon
      * entry to the method.
+     *
+     * Thanks to Stijn de Gouw, Jurriaan Rot, Frank S. de Boer,
+     * Richard Bubel and Reiner Hahnle, this is fixed with respect to
+     * the analysis in "On the Worst-Case Complexity of TimSort" by
+     * Nicolas Auger, Vincent Jug, Cyril Nicaud, and Carine Pivoteau.
      */
     private void mergeCollapse() {
       while (stackSize > 1) {
         int n = stackSize - 2;
-        if ( (n >= 1 && runLen[n-1] <= runLen[n] + runLen[n+1])
-          || (n >= 2 && runLen[n-2] <= runLen[n] + runLen[n-1])) {
+        if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1] ||
+            n > 1 && runLen[n-2] <= runLen[n] + runLen[n-1]) {
           if (runLen[n - 1] < runLen[n + 1])
             n--;
-        } else if (runLen[n] > runLen[n + 1]) {
+        } else if (n < 0 || runLen[n] > runLen[n + 1]) {
           break; // Invariant is established
         }
         mergeAt(n);


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to