svn commit: r28957 - in /dev/spark/2.4.0-SNAPSHOT-2018_08_25_12_01-c17a8ff-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Sat Aug 25 19:16:22 2018 New Revision: 28957 Log: Apache Spark 2.4.0-SNAPSHOT-2018_08_25_12_01-c17a8ff docs [This commit notification would consist of 1478 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-25214][SS][FOLLOWUP] Fix the issue that Kafka v2 source may return duplicated records when `failOnDataLoss=false`
Repository: spark Updated Branches: refs/heads/master 6c66ab8b3 -> c17a8ff52 [SPARK-25214][SS][FOLLOWUP] Fix the issue that Kafka v2 source may return duplicated records when `failOnDataLoss=false` ## What changes were proposed in this pull request? This is a follow up PR for #22207 to fix a potential flaky test. `processAllAvailable` doesn't work for continuous processing so we should not use it for a continuous query. ## How was this patch tested? Jenkins. Closes #22230 from zsxwing/SPARK-25214-2. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c17a8ff5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c17a8ff5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c17a8ff5 Branch: refs/heads/master Commit: c17a8ff52377871ab4ff96b648ebaf4112f0b5be Parents: 6c66ab8 Author: Shixiong Zhu Authored: Sat Aug 25 09:17:40 2018 -0700 Committer: Shixiong Zhu Committed: Sat Aug 25 09:17:40 2018 -0700 -- .../spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala| 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c17a8ff5/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala -- diff --git a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala index 0ff341c..39c4e3f 100644 --- a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala +++ b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala @@ -80,7 +80,7 @@ trait KafkaMissingOffsetsTest extends SharedSQLContext { } } -class KafkaDontFailOnDataLossSuite extends KafkaMissingOffsetsTest { +class KafkaDontFailOnDataLossSuite extends StreamTest with KafkaMissingOffsetsTest { import testImplicits._ @@ -165,7 +165,11 @@ class KafkaDontFailOnDataLossSuite extends KafkaMissingOffsetsTest { .trigger(Trigger.Continuous(100)) .start() try { -query.processAllAvailable() +// `processAllAvailable` doesn't work for continuous processing, so just wait until the last +// record appears in the table. +eventually(timeout(streamingTimeout)) { + assert(spark.table(table).as[String].collect().contains("49")) +} } finally { query.stop() } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r28956 - in /dev/spark/2.4.0-SNAPSHOT-2018_08_25_08_02-6c66ab8-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Sat Aug 25 15:19:27 2018 New Revision: 28956 Log: Apache Spark 2.4.0-SNAPSHOT-2018_08_25_08_02-6c66ab8 docs [This commit notification would consist of 1478 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-24688][EXAMPLES] Modify the comments about LabeledPoint
Repository: spark Updated Branches: refs/heads/master 3e4f1666a -> 6c66ab8b3 [SPARK-24688][EXAMPLES] Modify the comments about LabeledPoint ## What changes were proposed in this pull request? An RDD is created using LabeledPoint, but the comment is like #LabeledPoint(feature, label). Although in the method ChiSquareTest.test, the second parameter is feature and the third parameter is label, it it better to write label in front of feature here because if an RDD is created using LabeldPoint, what we get are actually (label, feature) pairs. Now it is changed as LabeledPoint(label, feature). The comments in Scala and Java example have the same typos. ## How was this patch tested? tested https://issues.apache.org/jira/browse/SPARK-24688 Author: Weizhe Huang 492816239qq.com Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #21665 from uzmijnlm/my_change. Authored-by: Huangweizhe Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6c66ab8b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6c66ab8b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6c66ab8b Branch: refs/heads/master Commit: 6c66ab8b334c5358bc77995650f1886e4c43231d Parents: 3e4f166 Author: Huangweizhe Authored: Sat Aug 25 09:24:20 2018 -0500 Committer: Sean Owen Committed: Sat Aug 25 09:24:20 2018 -0500 -- .../spark/examples/mllib/JavaHypothesisTestingExample.java | 2 +- examples/src/main/python/mllib/hypothesis_testing_example.py | 2 +- .../apache/spark/examples/mllib/HypothesisTestingExample.scala | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6c66ab8b/examples/src/main/java/org/apache/spark/examples/mllib/JavaHypothesisTestingExample.java -- diff --git a/examples/src/main/java/org/apache/spark/examples/mllib/JavaHypothesisTestingExample.java b/examples/src/main/java/org/apache/spark/examples/mllib/JavaHypothesisTestingExample.java index b48b95f..2732736 100644 --- a/examples/src/main/java/org/apache/spark/examples/mllib/JavaHypothesisTestingExample.java +++ b/examples/src/main/java/org/apache/spark/examples/mllib/JavaHypothesisTestingExample.java @@ -67,7 +67,7 @@ public class JavaHypothesisTestingExample { ) ); -// The contingency table is constructed from the raw (feature, label) pairs and used to conduct +// The contingency table is constructed from the raw (label, feature) pairs and used to conduct // the independence test. Returns an array containing the ChiSquaredTestResult for every feature // against the label. ChiSqTestResult[] featureTestResults = Statistics.chiSqTest(obs.rdd()); http://git-wip-us.apache.org/repos/asf/spark/blob/6c66ab8b/examples/src/main/python/mllib/hypothesis_testing_example.py -- diff --git a/examples/src/main/python/mllib/hypothesis_testing_example.py b/examples/src/main/python/mllib/hypothesis_testing_example.py index e566ead..21a5584 100644 --- a/examples/src/main/python/mllib/hypothesis_testing_example.py +++ b/examples/src/main/python/mllib/hypothesis_testing_example.py @@ -51,7 +51,7 @@ if __name__ == "__main__": [LabeledPoint(1.0, [1.0, 0.0, 3.0]), LabeledPoint(1.0, [1.0, 2.0, 0.0]), LabeledPoint(1.0, [-1.0, 0.0, -0.5])] -) # LabeledPoint(feature, label) +) # LabeledPoint(label, feature) # The contingency table is constructed from an RDD of LabeledPoint and used to conduct # the independence test. Returns an array containing the ChiSquaredTestResult for every feature http://git-wip-us.apache.org/repos/asf/spark/blob/6c66ab8b/examples/src/main/scala/org/apache/spark/examples/mllib/HypothesisTestingExample.scala -- diff --git a/examples/src/main/scala/org/apache/spark/examples/mllib/HypothesisTestingExample.scala b/examples/src/main/scala/org/apache/spark/examples/mllib/HypothesisTestingExample.scala index add1719..9b3c326 100644 --- a/examples/src/main/scala/org/apache/spark/examples/mllib/HypothesisTestingExample.scala +++ b/examples/src/main/scala/org/apache/spark/examples/mllib/HypothesisTestingExample.scala @@ -61,9 +61,9 @@ object HypothesisTestingExample { LabeledPoint(-1.0, Vectors.dense(-1.0, 0.0, -0.5) ) ) - ) // (feature, label) pairs. + ) // (label, feature) pairs. -// The contingency table is constructed from the raw (feature, label) pairs and used to conduct +// The contingency table is constructed from the raw (label, feature) pairs and used to conduct
svn commit: r28954 - in /dev/spark/2.4.0-SNAPSHOT-2018_08_25_00_02-3e4f166-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s
Author: pwendell Date: Sat Aug 25 07:19:12 2018 New Revision: 28954 Log: Apache Spark 2.4.0-SNAPSHOT-2018_08_25_00_02-3e4f166 docs [This commit notification would consist of 1478 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org