This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 1304974 [SPARK-26955][CORE] Align Spark's TimSort to jdk11 implementation 1304974 is described below commit 13049745391f97601ebc1402e77210638d908e40 Author: Maxim Gekk <maxim.g...@databricks.com> AuthorDate: Thu Feb 21 22:18:23 2019 -0600 [SPARK-26955][CORE] Align Spark's TimSort to jdk11 implementation ## What changes were proposed in this pull request? Spark's TimSort deviates from JDK 11 TimSort in a couple places: - `stackLen` was increased in jdk - additional cases for break in `mergeCollapse`: `n < 0` In the PR, I propose to align Spark TimSort to jdk implementation. ## How was this patch tested? By existing test suites, in particular, `SorterSuite`. Closes #23858 from MaxGekk/timsort-java-alignment. Authored-by: Maxim Gekk <maxim.g...@databricks.com> Signed-off-by: Sean Owen <sean.o...@databricks.com> --- .../java/org/apache/spark/util/collection/TimSort.java | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/core/src/main/java/org/apache/spark/util/collection/TimSort.java b/core/src/main/java/org/apache/spark/util/collection/TimSort.java index 40b5fb7..3142866 100644 --- a/core/src/main/java/org/apache/spark/util/collection/TimSort.java +++ b/core/src/main/java/org/apache/spark/util/collection/TimSort.java @@ -409,10 +409,14 @@ class TimSort<K, Buffer> { * large) stack lengths for smaller arrays. The "magic numbers" in the * computation below must be changed if MIN_MERGE is decreased. See * the MIN_MERGE declaration above for more information. + * The maximum value of 49 allows for an array up to length + * Integer.MAX_VALUE-4, if array is filled by the worst case stack size + * increasing scenario. More explanations are given in section 4 of: + * http://envisage-project.eu/wp-content/uploads/2015/02/sorting.pdf */ int stackLen = (len < 120 ? 5 : len < 1542 ? 10 : - len < 119151 ? 19 : 40); + len < 119151 ? 24 : 49); runBase = new int[stackLen]; runLen = new int[stackLen]; } @@ -439,15 +443,20 @@ class TimSort<K, Buffer> { * This method is called each time a new run is pushed onto the stack, * so the invariants are guaranteed to hold for i < stackSize upon * entry to the method. + * + * Thanks to Stijn de Gouw, Jurriaan Rot, Frank S. de Boer, + * Richard Bubel and Reiner Hahnle, this is fixed with respect to + * the analysis in "On the Worst-Case Complexity of TimSort" by + * Nicolas Auger, Vincent Jug, Cyril Nicaud, and Carine Pivoteau. */ private void mergeCollapse() { while (stackSize > 1) { int n = stackSize - 2; - if ( (n >= 1 && runLen[n-1] <= runLen[n] + runLen[n+1]) - || (n >= 2 && runLen[n-2] <= runLen[n] + runLen[n-1])) { + if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1] || + n > 1 && runLen[n-2] <= runLen[n] + runLen[n-1]) { if (runLen[n - 1] < runLen[n + 1]) n--; - } else if (runLen[n] > runLen[n + 1]) { + } else if (n < 0 || runLen[n] > runLen[n + 1]) { break; // Invariant is established } mergeAt(n); --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org