date:20180810

spark git commit: [SPARK-25089][R] removing lintr checks for 2.0

2018-08-10 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 dccd8c754 -> 5ed89ceaf


[SPARK-25089][R] removing lintr checks for 2.0

## What changes were proposed in this pull request?

since 2.0 will be EOLed some time in the not too distant future, and we'll be 
moving the builds from centos to ubuntu, i think it's fine to disable R linting 
rather than going down the rabbit hole of trying to fix this stuff.

## How was this patch tested?

the build system will test this

Closes #22074 from shaneknapp/removing-lintr-2.0.

Authored-by: shane knapp 
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5ed89cea
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5ed89cea
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5ed89cea

Branch: refs/heads/branch-2.0
Commit: 5ed89ceaf367590f79401abbf9ff7fc66507fe4e
Parents: dccd8c7
Author: shane knapp 
Authored: Fri Aug 10 18:07:18 2018 -0500
Committer: Sean Owen 
Committed: Fri Aug 10 18:07:18 2018 -0500

--
 dev/run-tests.py | 14 --
 1 file changed, 14 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5ed89cea/dev/run-tests.py
--
diff --git a/dev/run-tests.py b/dev/run-tests.py
index 43e3bf6..063a879 100755
--- a/dev/run-tests.py
+++ b/dev/run-tests.py
@@ -212,18 +212,6 @@ def run_python_style_checks():
 run_cmd([os.path.join(SPARK_HOME, "dev", "lint-python")])
 
 
-def run_sparkr_style_checks():
-set_title_and_block("Running R style checks", "BLOCK_R_STYLE")
-
-if which("R"):
-# R style check should be executed after `install-dev.sh`.
-# Since warnings about `no visible global function definition` appear
-# without the installation. SEE ALSO: SPARK-9121.
-run_cmd([os.path.join(SPARK_HOME, "dev", "lint-r")])
-else:
-print("Ignoring SparkR style check as R was not found in PATH")
-
-
 def build_spark_documentation():
 set_title_and_block("Building Spark Documentation", "BLOCK_DOCUMENTATION")
 os.environ["PRODUCTION"] = "1 jekyll build"
@@ -555,8 +543,6 @@ def main():
 pass
 if not changed_files or any(f.endswith(".py") for f in changed_files):
 run_python_style_checks()
-if not changed_files or any(f.endswith(".R") for f in changed_files):
-run_sparkr_style_checks()
 
 # determine if docs were changed and if we're inside the amplab environment
 # note - the below commented out until *all* Jenkins workers can get 
`jekyll` installed


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25089][R] removing lintr checks for 2.1

2018-08-10 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 42229430f -> 09f70f5fd


[SPARK-25089][R] removing lintr checks for 2.1

## What changes were proposed in this pull request?

since 2.1 will be EOLed some time in the not too distant future, and we'll be 
moving the builds from centos to ubuntu, i think it's fine to disable R linting 
rather than going down the rabbit hole of trying to fix this stuff.

## How was this patch tested?

the build system will test this

Closes #22073 from shaneknapp/removing-lintr.

Authored-by: shane knapp 
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/09f70f5f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/09f70f5f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/09f70f5f

Branch: refs/heads/branch-2.1
Commit: 09f70f5fd681dd3f38b7d9e6514f1fd63703e7f1
Parents: 4222943
Author: shane knapp 
Authored: Fri Aug 10 18:06:54 2018 -0500
Committer: Sean Owen 
Committed: Fri Aug 10 18:06:54 2018 -0500

--
 dev/run-tests.py | 14 --
 1 file changed, 14 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/09f70f5f/dev/run-tests.py
--
diff --git a/dev/run-tests.py b/dev/run-tests.py
index f24aac9..8ff8f51 100755
--- a/dev/run-tests.py
+++ b/dev/run-tests.py
@@ -212,18 +212,6 @@ def run_python_style_checks():
 run_cmd([os.path.join(SPARK_HOME, "dev", "lint-python")])
 
 
-def run_sparkr_style_checks():
-set_title_and_block("Running R style checks", "BLOCK_R_STYLE")
-
-if which("R"):
-# R style check should be executed after `install-dev.sh`.
-# Since warnings about `no visible global function definition` appear
-# without the installation. SEE ALSO: SPARK-9121.
-run_cmd([os.path.join(SPARK_HOME, "dev", "lint-r")])
-else:
-print("Ignoring SparkR style check as R was not found in PATH")
-
-
 def build_spark_documentation():
 set_title_and_block("Building Spark Documentation", "BLOCK_DOCUMENTATION")
 os.environ["PRODUCTION"] = "1 jekyll build"
@@ -561,8 +549,6 @@ def main():
 pass
 if not changed_files or any(f.endswith(".py") for f in changed_files):
 run_python_style_checks()
-if not changed_files or any(f.endswith(".R") for f in changed_files):
-run_sparkr_style_checks()
 
 # determine if docs were changed and if we're inside the amplab environment
 # note - the below commented out until *all* Jenkins workers can get 
`jekyll` installed


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-24908][R][STYLE] removing spaces to make lintr happy

2018-08-10 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 04c652064 -> a0a7e41cf


[SPARK-24908][R][STYLE] removing spaces to make lintr happy

## What changes were proposed in this pull request?

during my travails in porting spark builds to run on our centos worker, i 
managed to recreate (as best i could) the centos environment on our new 
ubuntu-testing machine.

while running my initial builds, lintr was crashing on some extraneous spaces 
in test_basic.R (see:  
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-ubuntu-test/862/console)

after removing those spaces, the ubuntu build happily passed the lintr tests.

## How was this patch tested?

i then tested this against a modified spark-master-test-sbt-hadoop-2.6 build 
(see 
https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/testing-spark-master-test-with-updated-R-crap/4/),
 which scp'ed a copy of test_basic.R in to the repo after the git clone.  
everything seems to be working happily.

Author: shane knapp 

Closes #21864 from shaneknapp/fixing-R-lint-spacing.

(cherry picked from commit 3efdf35327be38115b04b08e9c8d0aa282a904ab)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a0a7e41c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a0a7e41c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a0a7e41c

Branch: refs/heads/branch-2.3
Commit: a0a7e41cfbcebf1eb0228b4acfdb0381c8eeb79f
Parents: 04c6520
Author: shane knapp 
Authored: Tue Jul 24 16:13:57 2018 -0700
Committer: Sean Owen 
Committed: Fri Aug 10 14:52:04 2018 -0500

--
 R/pkg/inst/tests/testthat/test_basic.R | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a0a7e41c/R/pkg/inst/tests/testthat/test_basic.R
--
diff --git a/R/pkg/inst/tests/testthat/test_basic.R 
b/R/pkg/inst/tests/testthat/test_basic.R
index 243f5f0..80df3d8 100644
--- a/R/pkg/inst/tests/testthat/test_basic.R
+++ b/R/pkg/inst/tests/testthat/test_basic.R
@@ -18,9 +18,9 @@
 context("basic tests for CRAN")
 
 test_that("create DataFrame from list or data.frame", {
-  tryCatch( checkJavaVersion(),
+  tryCatch(checkJavaVersion(),
 error = function(e) { skip("error on Java check") },
-warning = function(e) { skip("warning on Java check") } )
+warning = function(e) { skip("warning on Java check") })
 
   sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE,
  sparkConfig = sparkRTestConfig)
@@ -54,9 +54,9 @@ test_that("create DataFrame from list or data.frame", {
 })
 
 test_that("spark.glm and predict", {
-  tryCatch( checkJavaVersion(),
+  tryCatch(checkJavaVersion(),
 error = function(e) { skip("error on Java check") },
-warning = function(e) { skip("warning on Java check") } )
+warning = function(e) { skip("warning on Java check") })
 
   sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE,
  sparkConfig = sparkRTestConfig)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r28663 - in /dev/spark/2.4.0-SNAPSHOT-2018_08_10_12_02-f5aba65-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-08-10 Thread pwendell

Author: pwendell
Date: Fri Aug 10 19:16:32 2018
New Revision: 28663

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_08_10_12_02-f5aba65 docs


[This commit notification would consist of 1476 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25081][CORE] Nested spill in ShuffleExternalSorter should not access released memory page

2018-08-10 Thread zsxwing

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 7306ac71d -> 04c652064


[SPARK-25081][CORE] Nested spill in ShuffleExternalSorter should not access 
released memory page

## What changes were proposed in this pull request?

This issue is pretty similar to 
[SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907).

"allocateArray" in 
[ShuffleInMemorySorter.reset](https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99)
 may trigger a spill and cause ShuffleInMemorySorter access the released 
`array`. Another task may get the same memory page from the pool. This will 
cause two tasks access the same memory page. When a task reads memory written 
by another task, many types of failures may happen. Here are some examples I  
have seen:

- JVM crash. (This is easy to reproduce in a unit test as we fill newly 
allocated and deallocated memory with 0xa5 and 0x5a bytes which usually points 
to an invalid memory address)
- java.lang.IllegalArgumentException: Comparison method violates its general 
contract!
- java.lang.NullPointerException at 
org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384)
- java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size 
-536870912 because the size after growing exceeds size limitation 2147483632

This PR resets states in `ShuffleInMemorySorter.reset` before calling 
`allocateArray` to fix the issue.

## How was this patch tested?

The new unit test will make JVM crash without the fix.

Closes #22062 from zsxwing/SPARK-25081.

Authored-by: Shixiong Zhu 
Signed-off-by: Shixiong Zhu 
(cherry picked from commit f5aba657396bd4e2e03dd06491a2d169a99592a7)
Signed-off-by: Shixiong Zhu 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/04c65206
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/04c65206
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/04c65206

Branch: refs/heads/branch-2.3
Commit: 04c652064861720d991675b7f5b53f2bbca9d14d
Parents: 7306ac7
Author: Shixiong Zhu 
Authored: Fri Aug 10 10:53:44 2018 -0700
Committer: Shixiong Zhu 
Committed: Fri Aug 10 10:54:03 2018 -0700

--
 .../shuffle/sort/ShuffleInMemorySorter.java |  12 +-
 .../sort/ShuffleExternalSorterSuite.scala   | 111 +++
 2 files changed, 121 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/04c65206/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java 
b/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java
index dc36809..0d06912 100644
--- 
a/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java
+++ 
b/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java
@@ -66,7 +66,7 @@ final class ShuffleInMemorySorter {
*/
   private int usableCapacity = 0;
 
-  private int initialSize;
+  private final int initialSize;
 
   ShuffleInMemorySorter(MemoryConsumer consumer, int initialSize, boolean 
useRadixSort) {
 this.consumer = consumer;
@@ -95,12 +95,20 @@ final class ShuffleInMemorySorter {
   }
 
   public void reset() {
+// Reset `pos` here so that `spill` triggered by the below `allocateArray` 
will be no-op.
+pos = 0;
 if (consumer != null) {
   consumer.freeArray(array);
+  // As `array` has been released, we should set it to  `null` to avoid 
accessing it before
+  // `allocateArray` returns. `usableCapacity` is also set to `0` to avoid 
any codes writing
+  // data to `ShuffleInMemorySorter` when `array` is `null` (e.g., in
+  // ShuffleExternalSorter.growPointerArrayIfNecessary, we may try to 
access
+  // `ShuffleInMemorySorter` when `allocateArray` throws 
SparkOutOfMemoryError).
+  array = null;
+  usableCapacity = 0;
   array = consumer.allocateArray(initialSize);
   usableCapacity = getUsableCapacity();
 }
-pos = 0;
   }
 
   public void expandPointerArray(LongArray newArray) {

http://git-wip-us.apache.org/repos/asf/spark/blob/04c65206/core/src/test/scala/org/apache/spark/shuffle/sort/ShuffleExternalSorterSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/shuffle/sort/ShuffleExternalSorterSuite.scala
 
b/core/src/test/scala/org/apache/spark/shuffle/sort/ShuffleExternalSorterSuite.scala
new file mode 100644
index 000..b9f0e87
--- /dev/null
+++ 
b/core/src/test/scala/org/apache/spark/shuffle/sort/ShuffleExternalSorterSuite.scala
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache

spark git commit: [SPARK-25081][CORE] Nested spill in ShuffleExternalSorter should not access released memory page

2018-08-10 Thread zsxwing

Repository: spark
Updated Branches:
  refs/heads/master 91cdab51c -> f5aba6573


[SPARK-25081][CORE] Nested spill in ShuffleExternalSorter should not access 
released memory page

## What changes were proposed in this pull request?

This issue is pretty similar to 
[SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907).

"allocateArray" in 
[ShuffleInMemorySorter.reset](https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99)
 may trigger a spill and cause ShuffleInMemorySorter access the released 
`array`. Another task may get the same memory page from the pool. This will 
cause two tasks access the same memory page. When a task reads memory written 
by another task, many types of failures may happen. Here are some examples I  
have seen:

- JVM crash. (This is easy to reproduce in a unit test as we fill newly 
allocated and deallocated memory with 0xa5 and 0x5a bytes which usually points 
to an invalid memory address)
- java.lang.IllegalArgumentException: Comparison method violates its general 
contract!
- java.lang.NullPointerException at 
org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384)
- java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size 
-536870912 because the size after growing exceeds size limitation 2147483632

This PR resets states in `ShuffleInMemorySorter.reset` before calling 
`allocateArray` to fix the issue.

## How was this patch tested?

The new unit test will make JVM crash without the fix.

Closes #22062 from zsxwing/SPARK-25081.

Authored-by: Shixiong Zhu 
Signed-off-by: Shixiong Zhu 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f5aba657
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f5aba657
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f5aba657

Branch: refs/heads/master
Commit: f5aba657396bd4e2e03dd06491a2d169a99592a7
Parents: 91cdab5
Author: Shixiong Zhu 
Authored: Fri Aug 10 10:53:44 2018 -0700
Committer: Shixiong Zhu 
Committed: Fri Aug 10 10:53:44 2018 -0700

--
 .../shuffle/sort/ShuffleInMemorySorter.java |  12 +-
 .../sort/ShuffleExternalSorterSuite.scala   | 111 +++
 2 files changed, 121 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f5aba657/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java 
b/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java
index 8f49859..4b48599 100644
--- 
a/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java
+++ 
b/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java
@@ -65,7 +65,7 @@ final class ShuffleInMemorySorter {
*/
   private int usableCapacity = 0;
 
-  private int initialSize;
+  private final int initialSize;
 
   ShuffleInMemorySorter(MemoryConsumer consumer, int initialSize, boolean 
useRadixSort) {
 this.consumer = consumer;
@@ -94,12 +94,20 @@ final class ShuffleInMemorySorter {
   }
 
   public void reset() {
+// Reset `pos` here so that `spill` triggered by the below `allocateArray` 
will be no-op.
+pos = 0;
 if (consumer != null) {
   consumer.freeArray(array);
+  // As `array` has been released, we should set it to  `null` to avoid 
accessing it before
+  // `allocateArray` returns. `usableCapacity` is also set to `0` to avoid 
any codes writing
+  // data to `ShuffleInMemorySorter` when `array` is `null` (e.g., in
+  // ShuffleExternalSorter.growPointerArrayIfNecessary, we may try to 
access
+  // `ShuffleInMemorySorter` when `allocateArray` throws 
SparkOutOfMemoryError).
+  array = null;
+  usableCapacity = 0;
   array = consumer.allocateArray(initialSize);
   usableCapacity = getUsableCapacity();
 }
-pos = 0;
   }
 
   public void expandPointerArray(LongArray newArray) {

http://git-wip-us.apache.org/repos/asf/spark/blob/f5aba657/core/src/test/scala/org/apache/spark/shuffle/sort/ShuffleExternalSorterSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/shuffle/sort/ShuffleExternalSorterSuite.scala
 
b/core/src/test/scala/org/apache/spark/shuffle/sort/ShuffleExternalSorterSuite.scala
new file mode 100644
index 000..b9f0e87
--- /dev/null
+++ 
b/core/src/test/scala/org/apache/spark/shuffle/sort/ShuffleExternalSorterSuite.scala
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file

svn commit: r28661 - in /dev/spark/2.3.3-SNAPSHOT-2018_08_10_10_02-7306ac7-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-08-10 Thread pwendell

Author: pwendell
Date: Fri Aug 10 17:15:48 2018
New Revision: 28661

Log:
Apache Spark 2.3.3-SNAPSHOT-2018_08_10_10_02-7306ac7 docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][BUILD] Add ECCN notice required by http://www.apache.org/dev/crypto.html

2018-08-10 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-2.2 b283c1f05 -> 051ea3a62


[MINOR][BUILD] Add ECCN notice required by http://www.apache.org/dev/crypto.html

Add ECCN notice required by http://www.apache.org/dev/crypto.html
See https://issues.apache.org/jira/browse/LEGAL-398

This should probably be backported to 2.3, 2.2, as that's when the key dep 
(commons crypto) turned up. BC is actually unused, but still there.

N/A

Closes #22064 from srowen/ECCN.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 
(cherry picked from commit 91cdab51ccb3a4e3b6d76132d00f3da30598735b)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/051ea3a6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/051ea3a6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/051ea3a6

Branch: refs/heads/branch-2.2
Commit: 051ea3a6217fa1038e930906c58d8e86e9626e35
Parents: b283c1f
Author: Sean Owen 
Authored: Fri Aug 10 11:15:36 2018 -0500
Committer: Sean Owen 
Committed: Fri Aug 10 11:19:51 2018 -0500

--
 NOTICE | 25 +
 1 file changed, 25 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/051ea3a6/NOTICE
--
diff --git a/NOTICE b/NOTICE
index f4b64b5..737189a 100644
--- a/NOTICE
+++ b/NOTICE
@@ -5,6 +5,31 @@ This product includes software developed at
 The Apache Software Foundation (http://www.apache.org/).
 
 
+Export Control Notice
+-
+
+This distribution includes cryptographic software. The country in which you 
currently reside may have
+restrictions on the import, possession, use, and/or re-export to another 
country, of encryption software.
+BEFORE using any encryption software, please check your country's laws, 
regulations and policies concerning
+the import, possession, or use, and re-export of encryption software, to see 
if this is permitted. See
+ for more information.
+
+The U.S. Government Department of Commerce, Bureau of Industry and Security 
(BIS), has classified this
+software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes 
information security software
+using or performing cryptographic functions with asymmetric algorithms. The 
form and manner of this Apache
+Software Foundation distribution makes it eligible for export under the 
License Exception ENC Technology
+Software Unrestricted (TSU) exception (see the BIS Export Administration 
Regulations, Section 740.13) for
+both object code and source code.
+
+The following provides more details on the included cryptographic software:
+
+This software uses Apache Commons Crypto 
(https://commons.apache.org/proper/commons-crypto/) to
+support authentication, and encryption and decryption of data sent across the 
network between
+services.
+
+This software includes Bouncy Castle (http://bouncycastle.org/) to support the 
jets3t library.
+
+
 
 Common Development and Distribution License 1.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][BUILD] Add ECCN notice required by http://www.apache.org/dev/crypto.html

2018-08-10 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 e66f3f9b1 -> 7306ac71d


[MINOR][BUILD] Add ECCN notice required by http://www.apache.org/dev/crypto.html

Add ECCN notice required by http://www.apache.org/dev/crypto.html
See https://issues.apache.org/jira/browse/LEGAL-398

This should probably be backported to 2.3, 2.2, as that's when the key dep 
(commons crypto) turned up. BC is actually unused, but still there.

N/A

Closes #22064 from srowen/ECCN.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 
(cherry picked from commit 91cdab51ccb3a4e3b6d76132d00f3da30598735b)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7306ac71
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7306ac71
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7306ac71

Branch: refs/heads/branch-2.3
Commit: 7306ac71da0e31fa9655c5838dc7fcb6e4c0b7a0
Parents: e66f3f9
Author: Sean Owen 
Authored: Fri Aug 10 11:15:36 2018 -0500
Committer: Sean Owen 
Committed: Fri Aug 10 11:18:40 2018 -0500

--
 NOTICE | 25 +
 1 file changed, 25 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7306ac71/NOTICE
--
diff --git a/NOTICE b/NOTICE
index 6ec240e..876d606 100644
--- a/NOTICE
+++ b/NOTICE
@@ -5,6 +5,31 @@ This product includes software developed at
 The Apache Software Foundation (http://www.apache.org/).
 
 
+Export Control Notice
+-
+
+This distribution includes cryptographic software. The country in which you 
currently reside may have
+restrictions on the import, possession, use, and/or re-export to another 
country, of encryption software.
+BEFORE using any encryption software, please check your country's laws, 
regulations and policies concerning
+the import, possession, or use, and re-export of encryption software, to see 
if this is permitted. See
+ for more information.
+
+The U.S. Government Department of Commerce, Bureau of Industry and Security 
(BIS), has classified this
+software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes 
information security software
+using or performing cryptographic functions with asymmetric algorithms. The 
form and manner of this Apache
+Software Foundation distribution makes it eligible for export under the 
License Exception ENC Technology
+Software Unrestricted (TSU) exception (see the BIS Export Administration 
Regulations, Section 740.13) for
+both object code and source code.
+
+The following provides more details on the included cryptographic software:
+
+This software uses Apache Commons Crypto 
(https://commons.apache.org/proper/commons-crypto/) to
+support authentication, and encryption and decryption of data sent across the 
network between
+services.
+
+This software includes Bouncy Castle (http://bouncycastle.org/) to support the 
jets3t library.
+
+
 
 Common Development and Distribution License 1.0
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][BUILD] Add ECCN notice required by http://www.apache.org/dev/crypto.html

2018-08-10 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 1dd0f1744 -> 91cdab51c


[MINOR][BUILD] Add ECCN notice required by http://www.apache.org/dev/crypto.html

## What changes were proposed in this pull request?

Add ECCN notice required by http://www.apache.org/dev/crypto.html
See https://issues.apache.org/jira/browse/LEGAL-398

This should probably be backported to 2.3, 2.2, as that's when the key dep 
(commons crypto) turned up. BC is actually unused, but still there.

## How was this patch tested?

N/A

Closes #22064 from srowen/ECCN.

Authored-by: Sean Owen 
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/91cdab51
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/91cdab51
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/91cdab51

Branch: refs/heads/master
Commit: 91cdab51ccb3a4e3b6d76132d00f3da30598735b
Parents: 1dd0f17
Author: Sean Owen 
Authored: Fri Aug 10 11:15:36 2018 -0500
Committer: Sean Owen 
Committed: Fri Aug 10 11:15:36 2018 -0500

--
 NOTICE| 24 
 NOTICE-binary | 25 +
 2 files changed, 49 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/91cdab51/NOTICE
--
diff --git a/NOTICE b/NOTICE
index 9246cc5..23cb53f 100644
--- a/NOTICE
+++ b/NOTICE
@@ -4,3 +4,27 @@ Copyright 2014 and onwards The Apache Software Foundation.
 This product includes software developed at
 The Apache Software Foundation (http://www.apache.org/).
 
+
+Export Control Notice
+-
+
+This distribution includes cryptographic software. The country in which you 
currently reside may have
+restrictions on the import, possession, use, and/or re-export to another 
country, of encryption software.
+BEFORE using any encryption software, please check your country's laws, 
regulations and policies concerning
+the import, possession, or use, and re-export of encryption software, to see 
if this is permitted. See
+ for more information.
+
+The U.S. Government Department of Commerce, Bureau of Industry and Security 
(BIS), has classified this
+software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes 
information security software
+using or performing cryptographic functions with asymmetric algorithms. The 
form and manner of this Apache
+Software Foundation distribution makes it eligible for export under the 
License Exception ENC Technology
+Software Unrestricted (TSU) exception (see the BIS Export Administration 
Regulations, Section 740.13) for
+both object code and source code.
+
+The following provides more details on the included cryptographic software:
+
+This software uses Apache Commons Crypto 
(https://commons.apache.org/proper/commons-crypto/) to
+support authentication, and encryption and decryption of data sent across the 
network between
+services.
+
+This software includes Bouncy Castle (http://bouncycastle.org/) to support the 
jets3t library.

http://git-wip-us.apache.org/repos/asf/spark/blob/91cdab51/NOTICE-binary
--
diff --git a/NOTICE-binary b/NOTICE-binary
index d56f99b..3155c38 100644
--- a/NOTICE-binary
+++ b/NOTICE-binary
@@ -5,6 +5,31 @@ This product includes software developed at
 The Apache Software Foundation (http://www.apache.org/).
 
 
+Export Control Notice
+-
+
+This distribution includes cryptographic software. The country in which you 
currently reside may have
+restrictions on the import, possession, use, and/or re-export to another 
country, of encryption software.
+BEFORE using any encryption software, please check your country's laws, 
regulations and policies concerning
+the import, possession, or use, and re-export of encryption software, to see 
if this is permitted. See
+ for more information.
+
+The U.S. Government Department of Commerce, Bureau of Industry and Security 
(BIS), has classified this
+software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes 
information security software
+using or performing cryptographic functions with asymmetric algorithms. The 
form and manner of this Apache
+Software Foundation distribution makes it eligible for export under the 
License Exception ENC Technology
+Software Unrestricted (TSU) exception (see the BIS Export Administration 
Regulations, Section 740.13) for
+both object code and source code.
+
+The following provides more details on the included cryptographic software:
+
+This software uses Apache Commons Crypto 
(https://commons.apache.org/proper/commons-crypto/) to
+support authentication, and encryption and decryption of data sent across the 
network between

svn commit: r28657 - in /dev/spark/2.4.0-SNAPSHOT-2018_08_10_08_02-1dd0f17-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-08-10 Thread pwendell

Author: pwendell
Date: Fri Aug 10 15:16:21 2018
New Revision: 28657

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_08_10_08_02-1dd0f17 docs


[This commit notification would consist of 1476 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25036][SQL] Avoid discarding unmoored doc comment in Scala-2.12.

2018-08-10 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 4f1758509 -> 132bcceeb


[SPARK-25036][SQL] Avoid discarding unmoored doc comment in Scala-2.12.

## What changes were proposed in this pull request?

This PR avoid the following compilation error using sbt in Scala-2.12.

```
[error] [warn] 
/home/ishizaki/Spark/PR/scala212/spark/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala:410:
 discarding unmoored doc comment
[error] [warn] /**
[error] [warn]
[error] [warn] 
/home/ishizaki/Spark/PR/scala212/spark/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala:441:
 discarding unmoored doc comment
[error] [warn] /**
[error] [warn]
...
[error] [warn] 
/home/ishizaki/Spark/PR/scala212/spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:440:
 discarding unmoored doc comment
[error] [warn] /**
[error] [warn]
```

## How was this patch tested?

Existing UTs

Closes #22059 from kiszk/SPARK-25036d.

Authored-by: Kazuaki Ishizaki 
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/132bccee
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/132bccee
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/132bccee

Branch: refs/heads/master
Commit: 132bcceebb7723aea9845c9e207e572ecb44a4a2
Parents: 4f17585
Author: Kazuaki Ishizaki 
Authored: Fri Aug 10 07:32:52 2018 -0500
Committer: Sean Owen 
Committed: Fri Aug 10 07:32:52 2018 -0500

--
 .../main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala  | 4 ++--
 .../src/main/scala/org/apache/spark/deploy/yarn/Client.scala | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/132bccee/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala 
b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala
index 918560a..4cdd172 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala
@@ -407,7 +407,7 @@ private[spark] object RandomForest extends Logging with 
Serializable {
   metadata.isMulticlassWithCategoricalFeatures)
 logDebug("using nodeIdCache = " + nodeIdCache.nonEmpty.toString)
 
-/**
+/*
  * Performs a sequential aggregation over a partition for a particular 
tree and node.
  *
  * For each feature, the aggregate sufficient statistics are updated for 
the relevant
@@ -438,7 +438,7 @@ private[spark] object RandomForest extends Logging with 
Serializable {
   }
 }
 
-/**
+/*
  * Performs a sequential aggregation over a partition.
  *
  * Each data point contributes to one node. For each feature,

http://git-wip-us.apache.org/repos/asf/spark/blob/132bccee/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
--
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index ed9879c..75614a4 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -437,7 +437,7 @@ private[spark] class Client(
   }
 }
 
-/**
+/*
  * Distribute a file to the cluster.
  *
  * If the file's path is a "local:" URI, it's actually not distributed. 
Other files are copied


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r28651 - in /dev/spark/2.4.0-SNAPSHOT-2018_08_10_04_02-4f17585-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-08-10 Thread pwendell

Author: pwendell
Date: Fri Aug 10 11:20:55 2018
New Revision: 28651

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_08_10_04_02-4f17585 docs


[This commit notification would consist of 1476 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-19355][SQL] Use map output statistics to improve global limit's parallelism

2018-08-10 Thread hvanhovell

Repository: spark
Updated Branches:
  refs/heads/master 9abe09bfc -> 4f1758509


[SPARK-19355][SQL] Use map output statistics to improve global limit's 
parallelism

## What changes were proposed in this pull request?

A logical `Limit` is performed physically by two operations `LocalLimit` and 
`GlobalLimit`.

Most of time, we gather all data into a single partition in order to run 
`GlobalLimit`. If we use a very big limit number, shuffling data causes 
performance issue also reduces parallelism.

We can avoid shuffling into single partition if we don't care data ordering. 
This patch implements this idea by doing a map stage during global limit. It 
collects the info of row numbers at each partition. For each partition, we 
locally retrieves limited data without any shuffling to finish this global 
limit.

For example, we have three partitions with rows (100, 100, 50) respectively. In 
global limit of 100 rows, we may take (34, 33, 33) rows for each partition 
locally. After global limit we still have three partitions.

If the data partition has certain ordering, we can't distribute required rows 
evenly to each partitions because it could change data ordering. But we still 
can avoid shuffling.

## How was this patch tested?

Jenkins tests.

Author: Liang-Chi Hsieh 

Closes #16677 from viirya/improve-global-limit-parallelism.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4f175850
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4f175850
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4f175850

Branch: refs/heads/master
Commit: 4f175850985cfc4c64afb90d784bb292e81dc0b7
Parents: 9abe09b
Author: Liang-Chi Hsieh 
Authored: Fri Aug 10 11:32:15 2018 +0200
Committer: Herman van Hovell 
Committed: Fri Aug 10 11:32:15 2018 +0200

--
 .../sort/BypassMergeSortShuffleWriter.java  |   5 +-
 .../spark/shuffle/sort/UnsafeShuffleWriter.java |   3 +-
 .../org/apache/spark/MapOutputStatistics.scala  |   6 +-
 .../org/apache/spark/MapOutputTracker.scala |  10 +-
 .../org/apache/spark/scheduler/MapStatus.scala  |  43 +---
 .../spark/shuffle/sort/SortShuffleWriter.scala  |   3 +-
 .../shuffle/sort/UnsafeShuffleWriterSuite.java  |   2 +
 .../apache/spark/MapOutputTrackerSuite.scala|  28 ++---
 .../scala/org/apache/spark/ShuffleSuite.scala   |   1 +
 .../spark/scheduler/DAGSchedulerSuite.scala |  10 +-
 .../apache/spark/scheduler/MapStatusSuite.scala |  16 +--
 .../spark/serializer/KryoSerializerSuite.scala  |   3 +-
 .../catalyst/plans/physical/partitioning.scala  |  14 +++
 .../org/apache/spark/sql/internal/SQLConf.scala |   9 ++
 .../exchange/ShuffleExchangeExec.scala  |   8 ++
 .../org/apache/spark/sql/execution/limit.scala  | 101 ---
 .../test/resources/sql-tests/inputs/limit.sql   |   2 +
 .../inputs/subquery/in-subquery/in-limit.sql|   5 +-
 .../resources/sql-tests/results/limit.sql.out   |  92 +
 .../subquery/in-subquery/in-limit.sql.out   |  56 +-
 .../spark/sql/DataFrameAggregateSuite.scala |  12 ++-
 .../org/apache/spark/sql/SQLQuerySuite.scala|  11 +-
 .../execution/ExchangeCoordinatorSuite.scala|   6 +-
 .../spark/sql/execution/PlannerSuite.scala  |   4 +-
 .../hive/execution/HiveCompatibilitySuite.scala |   4 +
 .../spark/sql/hive/execution/PruningSuite.scala |   8 ++
 26 files changed, 322 insertions(+), 140 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4f175850/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
 
b/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
index 323a5d3..e3bd549 100644
--- 
a/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
+++ 
b/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
@@ -125,7 +125,7 @@ final class BypassMergeSortShuffleWriter extends 
ShuffleWriter {
 if (!records.hasNext()) {
   partitionLengths = new long[numPartitions];
   shuffleBlockResolver.writeIndexFileAndCommit(shuffleId, mapId, 
partitionLengths, null);
-  mapStatus = MapStatus$.MODULE$.apply(blockManager.shuffleServerId(), 
partitionLengths);
+  mapStatus = MapStatus$.MODULE$.apply(blockManager.shuffleServerId(), 
partitionLengths, 0);
   return;
 }
 final SerializerInstance serInstance = serializer.newInstance();
@@ -167,7 +167,8 @@ final class BypassMergeSortShuffleWriter extends 
ShuffleWriter {
 logger.error("Error while deleting temp file {}", 
tmp.getAbsolutePath());
   }
 }
-mapStatus =

spark git commit: [SPARK-24127][SS] Continuous text socket source

2018-08-10 Thread gurwls223

Repository: spark
Updated Branches:
  refs/heads/master ab1029fb8 -> 9abe09bfc


[SPARK-24127][SS] Continuous text socket source

## What changes were proposed in this pull request?

Support for text socket stream in spark structured streaming "continuous" mode. 
This is roughly based on the idea of ContinuousMemoryStream where the executor 
queries the data from driver over an RPC endpoint.

This makes it possible to create Structured streaming continuous pipeline to 
ingest data via "nc" and run examples.

## How was this patch tested?

Unit test and ran spark examples in structured streaming continuous mode.

Please review http://spark.apache.org/contributing.html before opening a pull 
request.

Closes #21199 from arunmahadevan/SPARK-24127.

Authored-by: Arun Mahadevan 
Signed-off-by: hyukjinkwon 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9abe09bf
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9abe09bf
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9abe09bf

Branch: refs/heads/master
Commit: 9abe09bfc18580233acad676d1241684c7d8768d
Parents: ab1029f
Author: Arun Mahadevan 
Authored: Fri Aug 10 15:53:31 2018 +0800
Committer: hyukjinkwon 
Committed: Fri Aug 10 15:53:31 2018 +0800

--
 .../streaming/ContinuousRecordEndpoint.scala|  69 +
 .../continuous/ContinuousTextSocketSource.scala | 292 +++
 .../sources/ContinuousMemoryStream.scala|  32 +-
 .../execution/streaming/sources/socket.scala|  25 +-
 .../sources/TextSocketStreamSuite.scala |  98 ++-
 5 files changed, 482 insertions(+), 34 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9abe09bf/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ContinuousRecordEndpoint.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ContinuousRecordEndpoint.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ContinuousRecordEndpoint.scala
new file mode 100644
index 000..c9c2ebc
--- /dev/null
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ContinuousRecordEndpoint.scala
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.streaming
+
+import org.apache.spark.SparkEnv
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.sources.v2.reader.streaming.PartitionOffset
+
+case class ContinuousRecordPartitionOffset(partitionId: Int, offset: Int) 
extends PartitionOffset
+case class GetRecord(offset: ContinuousRecordPartitionOffset)
+
+/**
+ * A RPC end point for continuous readers to poll for
+ * records from the driver.
+ *
+ * @param buckets the data buckets. Each bucket contains a sequence of items 
to be
+ *returned for a partition. The number of buckets should be 
equal to
+ *to the number of partitions.
+ * @param lock a lock object for locking the buckets for read
+ */
+class ContinuousRecordEndpoint(buckets: Seq[Seq[Any]], lock: Object)
+  extends ThreadSafeRpcEndpoint {
+
+  private var startOffsets: Seq[Int] = List.fill(buckets.size)(0)
+
+  /**
+   * Sets the start offset.
+   *
+   * @param offsets the base offset per partition to be used
+   *while retrieving the data in {#receiveAndReply}.
+   */
+  def setStartOffsets(offsets: Seq[Int]): Unit = {
+lock.synchronized {
+  startOffsets = offsets
+}
+  }
+
+  override val rpcEnv: RpcEnv = SparkEnv.get.rpcEnv
+
+  /**
+   * Process messages from `RpcEndpointRef.ask`. If receiving a unmatched 
message,
+   * `SparkException` will be thrown and sent to `onError`.
+   */
+  override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, 
Unit] = {
+case GetRecord(ContinuousRecordPartitionOffset(partitionId, offset)) =>
+  lock.synchronized {
+val bufOffset = offset -

svn commit: r28650 - in /dev/spark/2.4.0-SNAPSHOT-2018_08_10_00_02-ab1029f-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-08-10 Thread pwendell

Author: pwendell
Date: Fri Aug 10 07:16:55 2018
New Revision: 28650

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_08_10_00_02-ab1029f docs


[This commit notification would consist of 1476 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-23912][SQL][FOLLOWUP] Refactor ArrayDistinct

2018-08-10 Thread ueshin

Repository: spark
Updated Branches:
  refs/heads/master 0cea9e3cd -> ab1029fb8


[SPARK-23912][SQL][FOLLOWUP] Refactor ArrayDistinct

## What changes were proposed in this pull request?

This PR simplified code generation for `ArrayDistinct`. #21966 enabled code 
generation only if the type can be specialized by the hash set. This PR follows 
this strategy.

Optimization of null handling will be implemented in #21912.

## How was this patch tested?

Existing UTs

Closes #22044 from kiszk/SPARK-23912-follow.

Authored-by: Kazuaki Ishizaki 
Signed-off-by: Takuya UESHIN 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ab1029fb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ab1029fb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ab1029fb

Branch: refs/heads/master
Commit: ab1029fb8aae586e3af1238048e8b3dcfeb096f4
Parents: 0cea9e3
Author: Kazuaki Ishizaki 
Authored: Fri Aug 10 15:41:59 2018 +0900
Committer: Takuya UESHIN 
Committed: Fri Aug 10 15:41:59 2018 +0900

--
 .../expressions/collectionOperations.scala  | 215 ++-
 1 file changed, 61 insertions(+), 154 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ab1029fb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index b37fdc6..5e3449d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -3410,6 +3410,28 @@ case class ArrayDistinct(child: Expression)
 case _ => false
   }
 
+  @transient protected lazy val canUseSpecializedHashSet = elementType match {
+case ByteType | ShortType | IntegerType | LongType | FloatType | 
DoubleType => true
+case _ => false
+  }
+
+  @transient protected lazy val (hsPostFix, hsTypeName) = {
+val ptName = CodeGenerator.primitiveTypeName(elementType)
+elementType match {
+  // we cast byte/short to int when writing to the hash set.
+  case ByteType | ShortType | IntegerType => ("$mcI$sp", "Int")
+  case LongType => ("$mcJ$sp", ptName)
+  case FloatType => ("$mcF$sp", ptName)
+  case DoubleType => ("$mcD$sp", ptName)
+}
+  }
+
+  // we cast byte/short to int when writing to the hash set.
+  @transient protected lazy val hsValueCast = elementType match {
+case ByteType | ShortType => "(int) "
+case _ => ""
+  }
+
   override def nullSafeEval(array: Any): Any = {
 val data = array.asInstanceOf[ArrayData].toArray[AnyRef](elementType)
 if (elementTypeSupportEquals) {
@@ -3442,17 +3464,15 @@ case class ArrayDistinct(child: Expression)
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-nullSafeCodeGen(ctx, ev, (array) => {
-  val i = ctx.freshName("i")
-  val j = ctx.freshName("j")
-  val sizeOfDistinctArray = ctx.freshName("sizeOfDistinctArray")
-  val getValue1 = CodeGenerator.getValue(array, elementType, i)
-  val getValue2 = CodeGenerator.getValue(array, elementType, j)
-  val foundNullElement = ctx.freshName("foundNullElement")
-  val openHashSet = classOf[OpenHashSet[_]].getName
-  val hs = ctx.freshName("hs")
-  val classTag = s"scala.reflect.ClassTag$$.MODULE$$.Object()"
-  if (elementTypeSupportEquals) {
+if (canUseSpecializedHashSet) {
+  nullSafeCodeGen(ctx, ev, (array) => {
+val i = ctx.freshName("i")
+val sizeOfDistinctArray = ctx.freshName("sizeOfDistinctArray")
+val foundNullElement = ctx.freshName("foundNullElement")
+val openHashSet = classOf[OpenHashSet[_]].getName
+val hs = ctx.freshName("hs")
+val classTag = s"scala.reflect.ClassTag$$.MODULE$$.$hsTypeName()"
+val getValue = CodeGenerator.getValue(array, elementType, i)
 s"""
|int $sizeOfDistinctArray = 0;
|boolean $foundNullElement = false;
@@ -3461,53 +3481,26 @@ case class ArrayDistinct(child: Expression)
|  if ($array.isNullAt($i)) {
|$foundNullElement = true;
|  } else {
-   |$hs.add($getValue1);
+   |$hs.add$hsPostFix($hsValueCast$getValue);
|  }
|}
|$sizeOfDistinctArray = $hs.size() + ($foundNullElement ? 1 : 0);
|${genCodeForResult(ctx, ev, array, sizeOfDistinctArray)}
  """.stripMargin
-  } else {
-s"""
-   |int $sizeOfDistinctArray = 0;
-

spark git commit: [SPARK-25089][R] removing lintr checks for 2.0

spark git commit: [SPARK-25089][R] removing lintr checks for 2.1

spark git commit: [SPARK-24908][R][STYLE] removing spaces to make lintr happy

svn commit: r28663 - in /dev/spark/2.4.0-SNAPSHOT-2018_08_10_12_02-f5aba65-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-25081][CORE] Nested spill in ShuffleExternalSorter should not access released memory page

spark git commit: [SPARK-25081][CORE] Nested spill in ShuffleExternalSorter should not access released memory page

svn commit: r28661 - in /dev/spark/2.3.3-SNAPSHOT-2018_08_10_10_02-7306ac7-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [MINOR][BUILD] Add ECCN notice required by http://www.apache.org/dev/crypto.html

spark git commit: [MINOR][BUILD] Add ECCN notice required by http://www.apache.org/dev/crypto.html

spark git commit: [MINOR][BUILD] Add ECCN notice required by http://www.apache.org/dev/crypto.html

svn commit: r28657 - in /dev/spark/2.4.0-SNAPSHOT-2018_08_10_08_02-1dd0f17-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-25036][SQL] Avoid discarding unmoored doc comment in Scala-2.12.

svn commit: r28651 - in /dev/spark/2.4.0-SNAPSHOT-2018_08_10_04_02-4f17585-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-19355][SQL] Use map output statistics to improve global limit's parallelism

spark git commit: [SPARK-24127][SS] Continuous text socket source

svn commit: r28650 - in /dev/spark/2.4.0-SNAPSHOT-2018_08_10_00_02-ab1029f-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

spark git commit: [SPARK-23912][SQL][FOLLOWUP] Refactor ArrayDistinct

17 matches

Site Navigation

Mail list logo

Footer information