[GitHub] spark issue #13834: [SPARK-16339] [CORE] ScriptTransform does not print stde...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/13834 @srowen : In case of exception, we `destroy()` the `proc` which cleans up all the associated streams : http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/lang/UNIXProcess.java#428 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13976#discussion_r69301324 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala --- @@ -89,4 +91,30 @@ class GeneratorFunctionSuite extends QueryTest with SharedSQLContext { exploded.join(exploded, exploded("i") === exploded("i")).agg(count("*")), Row(3) :: Nil) } + + test("inline with empty table or empty array") { --- End diff -- the test name is misleading: we do allow empty array, the problem is `array()` returns an array of null, which fails the type check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13976#discussion_r69301056 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala --- @@ -68,4 +69,23 @@ class GeneratorExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { PosExplode(CreateArray(str_array.map(Literal(_, str_correct_answer.map(InternalRow.fromSeq(_))) } + + test("inline") { +val correct_answer = Seq( + Seq(0, UTF8String.fromString("a")), --- End diff -- we can create a row directly in test: call `create_row(...)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13976#discussion_r69301097 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala --- @@ -68,4 +69,23 @@ class GeneratorExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { PosExplode(CreateArray(str_array.map(Literal(_, str_correct_answer.map(InternalRow.fromSeq(_))) } + + test("inline") { +val correct_answer = Seq( + Seq(0, UTF8String.fromString("a")), --- End diff -- and it can help us convert string to UTF8String --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13976#discussion_r69300935 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala --- @@ -68,4 +69,23 @@ class GeneratorExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { PosExplode(CreateArray(str_array.map(Literal(_, str_correct_answer.map(InternalRow.fromSeq(_))) } + + test("inline") { +val correct_answer = Seq( + Seq(0, UTF8String.fromString("a")), + Seq(1, UTF8String.fromString("b")), + Seq(2, UTF8String.fromString("c"))) + +checkTuple( + Inline(Literal.create(Array(), ArrayType(StructType(Seq(StructField("id1", LongType)), --- End diff -- we usually use `new StructType().add("id", LongType)` to create struct type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13976#discussion_r69300771 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -195,3 +195,42 @@ case class Explode(child: Expression) extends ExplodeBase(child, position = fals extended = "> SELECT _FUNC_(array(10,20));\n 0\t10\n 1\t20") // scalastyle:on line.size.limit case class PosExplode(child: Expression) extends ExplodeBase(child, position = true) + +/** + * Explodes an array of structs into a table. + */ +@ExpressionDescription( + usage = "_FUNC_(a) - Explodes an array of structs into a table.", + extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n [1,a]\n[2,b]") +case class Inline(child: Expression) extends UnaryExpression with Generator with CodegenFallback { + + override def children: Seq[Expression] = child :: Nil + + override def checkInputDataTypes(): TypeCheckResult = child.dataType match { +case ArrayType(et, _) if et.isInstanceOf[StructType] => + TypeCheckResult.TypeCheckSuccess +case _ => + TypeCheckResult.TypeCheckFailure( +s"input to function inline should be array of struct type, not ${child.dataType}") + } + + override def elementSchema: StructType = child.dataType match { +case ArrayType(et : StructType, _) => + StructType(et.fields.zipWithIndex.map { +case (field, index) => StructField(field.name, field.dataType, nullable = field.nullable) + }) + } + + private lazy val ncol = elementSchema.fields.length --- End diff -- I'd like to name it `numFields` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13976#discussion_r69300727 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -195,3 +195,42 @@ case class Explode(child: Expression) extends ExplodeBase(child, position = fals extended = "> SELECT _FUNC_(array(10,20));\n 0\t10\n 1\t20") // scalastyle:on line.size.limit case class PosExplode(child: Expression) extends ExplodeBase(child, position = true) + +/** + * Explodes an array of structs into a table. + */ +@ExpressionDescription( + usage = "_FUNC_(a) - Explodes an array of structs into a table.", + extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n [1,a]\n[2,b]") +case class Inline(child: Expression) extends UnaryExpression with Generator with CodegenFallback { + + override def children: Seq[Expression] = child :: Nil + + override def checkInputDataTypes(): TypeCheckResult = child.dataType match { +case ArrayType(et, _) if et.isInstanceOf[StructType] => + TypeCheckResult.TypeCheckSuccess +case _ => + TypeCheckResult.TypeCheckFailure( +s"input to function inline should be array of struct type, not ${child.dataType}") + } + + override def elementSchema: StructType = child.dataType match { +case ArrayType(et : StructType, _) => + StructType(et.fields.zipWithIndex.map { +case (field, index) => StructField(field.name, field.dataType, nullable = field.nullable) + }) + } + + private lazy val ncol = elementSchema.fields.length + + override def eval(input: InternalRow): TraversableOnce[InternalRow] = child.dataType match { --- End diff -- Why do we pattern match here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13976#discussion_r69300459 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -195,3 +195,42 @@ case class Explode(child: Expression) extends ExplodeBase(child, position = fals extended = "> SELECT _FUNC_(array(10,20));\n 0\t10\n 1\t20") // scalastyle:on line.size.limit case class PosExplode(child: Expression) extends ExplodeBase(child, position = true) + +/** + * Explodes an array of structs into a table. + */ +@ExpressionDescription( + usage = "_FUNC_(a) - Explodes an array of structs into a table.", + extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n [1,a]\n[2,b]") +case class Inline(child: Expression) extends UnaryExpression with Generator with CodegenFallback { + + override def children: Seq[Expression] = child :: Nil + + override def checkInputDataTypes(): TypeCheckResult = child.dataType match { +case ArrayType(et, _) if et.isInstanceOf[StructType] => + TypeCheckResult.TypeCheckSuccess +case _ => + TypeCheckResult.TypeCheckFailure( +s"input to function inline should be array of struct type, not ${child.dataType}") + } + + override def elementSchema: StructType = child.dataType match { +case ArrayType(et : StructType, _) => + StructType(et.fields.zipWithIndex.map { --- End diff -- hmm, so it's just `et` now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13981: [SPARK-16307] [ML] Add test to verify the predicted vari...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/13981 @MechCoder Thanks for adding this! I think it's a good test to protect against silent failures in the future. I just left a few small comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/13981#discussion_r69298953 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite assert(variance === expectedVariance, s"Expected variance $expectedVariance but got $variance.") } + +val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0) +dt.setMaxDepth(1) + .setMaxBins(6) + .setSeed(0) +val expectVariances = dt.fit(toyDF).transform(toyDF).select("variance").collect().map { + case Row(variance: Double) => variance } +val trueVariances = Array(0.667, 0.667, 0.667, 2.667, 2.667, 2.667) +trueVariances.zip(expectVariances).foreach(x => x._1 ~== x._2 absTol 1e-3) --- End diff -- Although this technically works, it is less confusing if use `assert` and unpack the tuple. Like ```scala ...foreach { case (actual, expected) => assert(actual ~== expected absTol 1e-3) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/13981#discussion_r69298648 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite assert(variance === expectedVariance, s"Expected variance $expectedVariance but got $variance.") } + +val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0) +dt.setMaxDepth(1) + .setMaxBins(6) + .setSeed(0) +val expectVariances = dt.fit(toyDF).transform(toyDF).select("variance").collect().map { --- End diff -- `expectedVariances` and `trueVariances` are mixed up here. Expected should be the theoretical value computed below. Also, it would be good to leave a comment explaining where those expected values came from. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14010 @srowen Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13976: [SPARK-16288][SQL] Implement inline table generating fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13976 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13976: [SPARK-16288][SQL] Implement inline table generating fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13976 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61615/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/13981#discussion_r69298384 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite assert(variance === expectedVariance, s"Expected variance $expectedVariance but got $variance.") } + +val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0) +dt.setMaxDepth(1) + .setMaxBins(6) --- End diff -- Not sure why we need to set maxBins here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14010 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61614/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14010 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13976: [SPARK-16288][SQL] Implement inline table generating fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13976 **[Test build #61615 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61615/consoleFull)** for PR 13976 at commit [`9382f64`](https://github.com/apache/spark/commit/9382f64a19c9671a679a75ce22b801aa32576da5). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Inline(child: Expression) extends UnaryExpression with Generator with CodegenFallback ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14010 **[Test build #61614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61614/consoleFull)** for PR 14010 at commit [`9fc83f6`](https://github.com/apache/spark/commit/9fc83f6c086eafcd58523234c4a95eb25158632b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14014 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61616/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14014 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14013: [SPARK-16344][SQL][BRANCH-1.6] Decoding Parquet array of...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14013 @rdblue Verified that parquet-avro also suffers from this issue. Filed [PARQUET-651][1] to track it. [1]: https://issues.apache.org/jira/browse/PARQUET-651 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14015: [SPARK-16345][Documentation][Examples][GraphX] Extract g...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14015 @srowen Yes, the example code is exactly the same as those in graphx doc, and I test them all, can run normally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14014 **[Test build #61616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61616/consoleFull)** for PR 14014 at commit [`3bfe45f`](https://github.com/apache/spark/commit/3bfe45fe8b81f44141b737df6b292f12cd37d06a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69297640 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala --- @@ -725,4 +725,43 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 0) checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 0) } + + test("ParseUrl") { +def checkParseUrl(expected: String, urlStr: String, partToExtract: String): Unit = { + checkEvaluation( +ParseUrl(Literal.create(urlStr, StringType), Literal.create(partToExtract, StringType)), +expected) +} +def checkParseUrlWithKey(expected: String, urlStr: String, + partToExtract: String, key: String): Unit = { + checkEvaluation( +ParseUrl(Literal.create(urlStr, StringType), Literal.create(partToExtract, StringType), + Literal.create(key, StringType)), expected) +} + +checkParseUrl("spark.apache.org", "http://spark.apache.org/path?query=1;, "HOST") +checkParseUrl("/path", "http://spark.apache.org/path?query=1;, "PATH") +checkParseUrl("query=1", "http://spark.apache.org/path?query=1;, "QUERY") +checkParseUrl("Ref", "http://spark.apache.org/path?query=1#Ref;, "REF") +checkParseUrl("http", "http://spark.apache.org/path?query=1;, "PROTOCOL") +checkParseUrl("/path?query=1", "http://spark.apache.org/path?query=1;, "FILE") +checkParseUrl("spark.apache.org:8080", "http://spark.apache.org:8080/path?query=1;, "AUTHORITY") +checkParseUrl("jian", "http://j...@spark.apache.org/path?query=1;, "USERINFO") +checkParseUrlWithKey("1", "http://spark.apache.org/path?query=1;, "QUERY", "query") + +// Null checking +checkParseUrl(null, null, "HOST") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, null) +checkParseUrl(null, null, null) +checkParseUrl(null, "test", "HOST") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, "NO") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "HOST", "query") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", "quer") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", null) --- End diff -- I am not sure. Is there any exceptional case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14008: [SPARK-16281][SQL] Implement parse_url SQL function
Github user janplus commented on the issue: https://github.com/apache/spark/pull/14008 @rxin and @dongjoon-hyun Thanks for your review. I have add a new commit which does following things: 1. Put `parse_url` function in the right order. 2. Use `""" """` instead of `+` in `extended` part to work with Scala 2.1. 3. Remove unnecessary `lazy`s. 4. Correct `REGEXPREFIX` and add a new null test case. 5. Use `NonFatal(_)` instead of the specified exception. 6. Fix the indentation problems. I have tried to not use varargs, but a separate constructor that accept two args does not help. As there isn't a magic key to make `parse_url(url, partToExtract, magic key)` to be treated as `parse_url(url, partToExtract)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14010 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14010 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61613/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14010 **[Test build #61613 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61613/consoleFull)** for PR 14010 at commit [`9fc83f6`](https://github.com/apache/spark/commit/9fc83f6c086eafcd58523234c4a95eb25158632b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13136: [SPARK-15350][mllib]add unit test function for Lo...
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/13136 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14015: [SPARK-16345][Documentation][Examples][GraphX] Extract g...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14015 No changes to the code itself (except perhaps style fixes)? OK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61610/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13494 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14013: [SPARK-16344][SQL][BRANCH-1.6] Decoding Parquet array of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14013 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61612/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13494 **[Test build #61610 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61610/consoleFull)** for PR 13494 at commit [`2568193`](https://github.com/apache/spark/commit/2568193f91b9ae129c19a67bfd514065215840ac). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class MetadataOnlyOptimizerSuite extends QueryTest with SharedSQLContext ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14013: [SPARK-16344][SQL][BRANCH-1.6] Decoding Parquet array of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14013 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14013: [SPARK-16344][SQL][BRANCH-1.6] Decoding Parquet array of...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14013 **[Test build #61612 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61612/consoleFull)** for PR 14013 at commit [`c40bccb`](https://github.com/apache/spark/commit/c40bccb631c2175d375e7c2e6ba83d1b831768af). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/9207 @holden that's true for the fully generic approach. But `DataFrameWriter` for example exposes `json` as a shortcut (type-safe in a way) for `format("json")`. I think we can achieve something similar here, but since each impl of `MLWriter` is different for each model, I think we can have a `PMML` trait attached to those that support it, enabling the "type-safe" approach: `model.write.pmml.save("/path")`. `model.write.save("/path")` does the default built-in format, `model.write.pmml.save("/path")` does pmml for those models that actually support it using the trait. For generics, it's possible to do `model.write.format("pmml").save(...)` and it would then fail at runtime if not supported, while `model.write.format("my.custom.format").save(...)` could allow plugging in writers similar to the datasource API... just thoughts, obviously more work will be required to see if it is feasible in practice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14015: [SPARK-16345][Documentation][Examples][GraphX] Extract g...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14015 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14015: [SPARK-16345][Documentation][Examples][GraphX] Extract g...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14015 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61618/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14015: [SPARK-16345][Documentation][Examples][GraphX] Extract g...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14015 **[Test build #61618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61618/consoleFull)** for PR 14015 at commit [`e9a096c`](https://github.com/apache/spark/commit/e9a096c8c7d1600ff4000560bd195db9a77a1046). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14015: [SPARK-16345][Documentation][Examples][GraphX] Extract g...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14015 **[Test build #61618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61618/consoleFull)** for PR 14015 at commit [`e9a096c`](https://github.com/apache/spark/commit/e9a096c8c7d1600ff4000560bd195db9a77a1046). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14015: [SPARK-16345][Documentation][Examples][GraphX] Extract g...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14015 **[Test build #61617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61617/consoleFull)** for PR 14015 at commit [`689d2f6`](https://github.com/apache/spark/commit/689d2f67e15c4e7d6b5b184712172bc46bce2128). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14015: [SPARK-16345][Documentation][Examples][GraphX] Extract g...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14015 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61617/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14015: [SPARK-16345][Documentation][Examples][GraphX] Extract g...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14015 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14015: [SPARK-16345][Documentation][Examples][GraphX] Extract g...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14015 **[Test build #61617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61617/consoleFull)** for PR 14015 at commit [`689d2f6`](https://github.com/apache/spark/commit/689d2f67e15c4e7d6b5b184712172bc46bce2128). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14015: [SPARK-16345][Documentation][Examples][GraphX] Ex...
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14015 [SPARK-16345][Documentation][Examples][GraphX] Extract graphx programming guide example snippets from source files instead of hard code them ## What changes were proposed in this pull request? I extract 6 example programs from GraphX programming guide and replace them with `include_example` label. The 6 example programs are: - AggregateMessagesExample.scala - SSSPExample.scala - TriangleCountingExample.scala - ConnectedComponentsExample.scala - ComprehensiveExample.scala - PageRankExample.scala All the example code can run using `bin/run-example graphx.EXAMPLE_NAME` ## How was this patch tested? Manual. You can merge this pull request into a Git repository by running: $ git pull https://github.com/WeichenXu123/spark graphx_example_plugin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14015.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14015 commit 689d2f67e15c4e7d6b5b184712172bc46bce2128 Author: WeichenXuDate: 2016-07-01T13:37:52Z add graphx example.4 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14013: [SPARK-16344][SQL][BRANCH-1.6] Decoding Parquet array of...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14013 @rdblue Would you mind to help review this one? My initial investigation suggested that parquet-avro probably suffers the same issue. Will file a parquet-mr JIRA ticket soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14014 **[Test build #61616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61616/consoleFull)** for PR 14014 at commit [`3bfe45f`](https://github.com/apache/spark/commit/3bfe45fe8b81f44141b737df6b292f12cd37d06a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14014: [SPARK-16344][SQL] Decoding Parquet array of struct with...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14014 cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14014: [SPARK-16344][SQL] Decoding Parquet array of stru...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/14014 [SPARK-16344][SQL] Decoding Parquet array of struct with a single field named "element" ## What changes were proposed in this pull request? This PR ports #14013 to master and branch-2.0. ## How was this patch tested? See #14013. You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark spark-16344-for-master-and-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14014.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14014 commit 3bfe45fe8b81f44141b737df6b292f12cd37d06a Author: Cheng LianDate: 2016-07-01T11:32:52Z Fixes SPARK-16344 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14004 cc @rxin and @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13976: [SPARK-16288][SQL] Implement inline table generating fun...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13976 Thank you, @cloud-fan . :) I've learn a lot in this PR again. The followings are applied. - Use `inputArray.getStruct(i, ncol)` - Keep the original field name - Fix elementSchema generation style - Add a column-based test, `Array()` expression-level test, add empty row test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13976: [SPARK-16288][SQL] Implement inline table generating fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13976 **[Test build #61615 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61615/consoleFull)** for PR 13976 at commit [`9382f64`](https://github.com/apache/spark/commit/9382f64a19c9671a679a75ce22b801aa32576da5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13925: [SPARK-16226][SQL]change the way of JDBC commit
Github user srowen commented on the issue: https://github.com/apache/spark/pull/13925 I think the important change regards the transaction isolation level that's in effect here, but yes that change is also a good one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14013: [SPARK-16344][SQL] Decoding Parquet array of struct with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14013 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14013: [SPARK-16344][SQL] Decoding Parquet array of struct with...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14013 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61611/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14013: [SPARK-16344][SQL] Decoding Parquet array of struct with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14013 **[Test build #61611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61611/consoleFull)** for PR 14013 at commit [`9620b48`](https://github.com/apache/spark/commit/9620b48d463ed2f2a8ede7397420050dc1e7d832). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14010 **[Test build #61614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61614/consoleFull)** for PR 14010 at commit [`9fc83f6`](https://github.com/apache/spark/commit/9fc83f6c086eafcd58523234c4a95eb25158632b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14010 **[Test build #61613 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61613/consoleFull)** for PR 14010 at commit [`9fc83f6`](https://github.com/apache/spark/commit/9fc83f6c086eafcd58523234c4a95eb25158632b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14010 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14013: [SPARK-16344][SQL] Decoding Parquet array of struct with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14013 **[Test build #61612 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61612/consoleFull)** for PR 14013 at commit [`c40bccb`](https://github.com/apache/spark/commit/c40bccb631c2175d375e7c2e6ba83d1b831768af). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14010 Jenkins retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14013: [SPARK-16344][SQL] Decoding Parquet array of struct with...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14013 **[Test build #61611 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61611/consoleFull)** for PR 14013 at commit [`9620b48`](https://github.com/apache/spark/commit/9620b48d463ed2f2a8ede7397420050dc1e7d832). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14013: [SPARK-16344][SQL] Decoding Parquet array of struct with...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14013 cc @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14013: [SPARK-16344][SQL] Decoding Parquet array of stru...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/14013#discussion_r69283877 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystRowConverter.scala --- @@ -481,13 +481,106 @@ private[parquet] class CatalystRowConverter( */ // scalastyle:on private def isElementType( -parquetRepeatedType: Type, catalystElementType: DataType, parentName: String): Boolean = { +parquetRepeatedType: Type, catalystElementType: DataType, parent: GroupType): Boolean = { + + def isStandardListLayout(t: GroupType): Boolean = +Option(parent.getOriginalType) == Some(LIST) && + t.getFieldCount == 1 && + t.getName == "list" && + t.getFieldName(0) == "element" + (parquetRepeatedType, catalystElementType) match { -case (t: PrimitiveType, _) => true -case (t: GroupType, _) if t.getFieldCount > 1 => true -case (t: GroupType, _) if t.getFieldCount == 1 && t.getName == "array" => true -case (t: GroupType, _) if t.getFieldCount == 1 && t.getName == parentName + "_tuple" => true -case (t: GroupType, StructType(Array(f))) if f.name == t.getFieldName(0) => true +case (t: PrimitiveType, _) => + // For legacy 2-level list types with primitive element type, e.g.: + // + //// List (nullable list, non-null elements) + //optional group my_list (LIST) { + // repeated int32 element; + //} + true + +case (t: GroupType, _) if t.getFieldCount > 1 => + // For legacy 2-level list types whose element type is a group type with 2 or more fields, + // e.g.: + // + //// List> (nullable list, non-null elements) + //optional group my_list (LIST) { + // repeated group element { + //required binary str (UTF8); + //required int32 num; + // }; + //} + true + +case (t: GroupType, _) if t.getFieldCount == 1 && t.getName == "array" => + // For Parquet data generated by parquet-thrift, e.g.: + // + //// List (nullable list, non-null elements) + //optional group my_list (LIST) { + // repeated group my_list_tuple { + //required binary str (UTF8); + // }; + //} + true + +case (t: GroupType, _) if t.getFieldCount == 1 && t.getName == parent + "_tuple" => + // For Parquet data generated by parquet-thrift, e.g.: + // + //// List (nullable list, non-null elements) + //optional group my_list (LIST) { + // repeated group my_list_tuple { + //required binary str (UTF8); + // }; + //} + true + +case (t: GroupType, _) if isStandardListLayout(t) => + // For standard 3-level list types, e.g.: + // + //// List (list nullable, elements non-null) + //optional group my_list (LIST) { + // repeated group list { + //required binary element (UTF8); + // } + //} + // + // This case branch must appear before the next one. See comments of the next case branch + // for details. + false --- End diff -- This case branch is essential for the bug fix. Basically, it matches the standard 3-level layout first before trying to match the legacy 2-level layout, so that the "element" syntactic group in Parquet LIST won't be mistaken for the "element" field in the nested struct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13374: [SPARK-13638][SQL] Add escapeAll option to CSV Da...
Github user jurriaan commented on a diff in the pull request: https://github.com/apache/spark/pull/13374#discussion_r69283761 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -366,6 +366,32 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils { } } + test("save csv with quoteAll enabled") { --- End diff -- Fixed :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14013: [SPARK-16344][SQL] Decoding Parquet array of stru...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/14013 [SPARK-16344][SQL] Decoding Parquet array of struct with a single field named "element" ## What changes were proposed in this pull request? Please refer to [SPARK-16344][1] for details about this issue. ## How was this patch tested? New test case added in `ParquetQuerySuite`. [1]: https://issues.apache.org/jira/browse/SPARK-16344 You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark spark-16344-parquet-schema-corner-case Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14013.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14013 commit 9620b48d463ed2f2a8ede7397420050dc1e7d832 Author: Cheng LianDate: 2016-07-01T10:52:29Z Fixes SPARK-16344 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14010 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14010 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61603/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14010: [GRAPHX][EXAMPLES] move graphx test data directory and u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14010 **[Test build #61603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61603/consoleFull)** for PR 14010 at commit [`9fc83f6`](https://github.com/apache/spark/commit/9fc83f6c086eafcd58523234c4a95eb25158632b). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61607/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13494 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13494 **[Test build #61607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61607/consoleFull)** for PR 13494 at commit [`a22e962`](https://github.com/apache/spark/commit/a22e9626e6294671e0915822def6eb283a72a643). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13494 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61608/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13494 **[Test build #61608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61608/consoleFull)** for PR 13494 at commit [`41fef2c`](https://github.com/apache/spark/commit/41fef2c40f4929fd26476ecdfa3ee8160394a7d3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13494 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61609/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13494 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13494 **[Test build #61609 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61609/consoleFull)** for PR 13494 at commit [`88f7308`](https://github.com/apache/spark/commit/88f7308173829ca2473690a0c409c438d3cd5cf4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13925: [SPARK-16226][SQL]change the way of JDBC commit
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/13925 @srowen Maybe we should change this condition to `conn.getMetaData().supportsTransactions()` ? I can prepare PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13494 **[Test build #61610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61610/consoleFull)** for PR 13494 at commit [`2568193`](https://github.com/apache/spark/commit/2568193f91b9ae129c19a67bfd514065215840ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13894: [SPARK-15254][DOC] Improve ML pipeline Cross Vali...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/13894#discussion_r69281183 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -56,7 +56,10 @@ private[ml] trait CrossValidatorParams extends ValidatorParams { /** * :: Experimental :: - * K-fold cross validation. + * CrossValidator begins by splitting the dataset into a set of non-overlapping randomly + * partitioned folds which are used as separate training and test datasets e.g., with k=3 folds, + * CrossValidator will generate 3 (training, test) dataset pairs, each of which uses 2/3 of + * the data for training and 1/3 for testing. Each fold is used in the testing set exactly once. --- End diff -- "used in the testing set" -> "used as the test set" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13389: [SPARK-9876][SQL][FOLLOWUP] Enable string and bin...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13389#discussion_r69280087 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/CatalystWriteSupport.scala --- @@ -150,7 +150,8 @@ private[parquet] class CatalystWriteSupport extends WriteSupport[InternalRow] wi case StringType => (row: SpecializedGetters, ordinal: Int) => - recordConsumer.addBinary(Binary.fromByteArray(row.getUTF8String(ordinal).getBytes)) + recordConsumer.addBinary( + Binary.fromReusedByteArray(row.getUTF8String(ordinal).getBytes)) --- End diff -- Thank you for your review! (Actually it is `UTF8String`. So, it has to be converted into `String` to use `Binary.fromString`).. though.. I am a bit worried that it might possibly be reused in the future (although I think it is not reused for now). This can write corrupt statistics if this is reused.. Is my understanding correct? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13374: [SPARK-13638][SQL] Add quoteAll option to CSV Dat...
Github user jurriaan commented on a diff in the pull request: https://github.com/apache/spark/pull/13374#discussion_r69279736 --- Diff: python/pyspark/sql/readwriter.py --- @@ -745,6 +748,8 @@ def csv(self, path, mode=None, compression=None, sep=None, quote=None, escape=No self.option("nullValue", nullValue) if escapeQuotes is not None: self.option("escapeQuotes", nullValue) +if escapeAll is not None: +self.option("escapeAll", nullValue) --- End diff -- Wow!, we should fix this `escapeQuotes` thing too.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69276506 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala --- @@ -725,4 +725,43 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 0) checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 0) } + + test("ParseUrl") { +def checkParseUrl(expected: String, urlStr: String, partToExtract: String): Unit = { + checkEvaluation( +ParseUrl(Literal.create(urlStr, StringType), Literal.create(partToExtract, StringType)), +expected) +} +def checkParseUrlWithKey(expected: String, urlStr: String, + partToExtract: String, key: String): Unit = { + checkEvaluation( +ParseUrl(Literal.create(urlStr, StringType), Literal.create(partToExtract, StringType), + Literal.create(key, StringType)), expected) +} + +checkParseUrl("spark.apache.org", "http://spark.apache.org/path?query=1;, "HOST") +checkParseUrl("/path", "http://spark.apache.org/path?query=1;, "PATH") +checkParseUrl("query=1", "http://spark.apache.org/path?query=1;, "QUERY") +checkParseUrl("Ref", "http://spark.apache.org/path?query=1#Ref;, "REF") +checkParseUrl("http", "http://spark.apache.org/path?query=1;, "PROTOCOL") +checkParseUrl("/path?query=1", "http://spark.apache.org/path?query=1;, "FILE") +checkParseUrl("spark.apache.org:8080", "http://spark.apache.org:8080/path?query=1;, "AUTHORITY") +checkParseUrl("jian", "http://j...@spark.apache.org/path?query=1;, "USERINFO") --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69276469 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -653,6 +655,128 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) } /** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = "Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO\n" + + "key specifies which query to extract\n" + + "Examples:\n" + + " > SELECT _FUNC_('http://spark.apache.org/path?query=1', " + + "'HOST') FROM src LIMIT 1;\n" + " 'spark.apache.org'\n" + + " > SELECT _FUNC_('http://spark.apache.org/path?query=1', " + + "'QUERY') FROM src LIMIT 1;\n" + " 'query=1'\n" + + " > SELECT _FUNC_('http://spark.apache.org/path?query=1', " + + "'QUERY', 'query') FROM src LIMIT 1;\n" + " '1'") +case class ParseUrl(children: Expression*) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + + private lazy val stringExprs = children.toArray --- End diff -- Try to avoid Scala Seqs' potential performance problems. https://github.com/apache/spark/pull/13966/files#r69184719 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14012: [SPARK-16343][SQL] Improve the PushDownPredicate rule to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14012 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14012: [SPARK-16343][SQL] Improve the PushDownPredicate rule to...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/14012 cc @liancheng @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14012: [SPARK-16343][SQL] Improve the PushDownPredicate ...
GitHub user jiangxb1987 opened a pull request: https://github.com/apache/spark/pull/14012 [SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown pre⦠## What changes were proposed in this pull request? Currently our Optimizer may reorder the predicates to run them more efficient, but in non-deterministic condition, change the order between deterministic parts and non-deterministic parts may change the number of input rows. For example: SELECT a FROM t WHERE rand() < 0.1 AND a = 1 And SELECT a FROM t WHERE a = 1 AND rand() < 0.1 may call rand() for different times and therefore the output rows differ. This PR improved this condition by check the predicate is placed before any non-deterministic predicates. ## How was this patch tested? Expanded related testcases in FilterPushdownSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jiangxb1987/spark ppd Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14012.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14012 commit 856d86d788b318c2975a5318b181678f4b71f5bc Author: èæåDate: 2016-07-01T09:10:50Z [SPARK-16343][SQL] Improve the PushDownPredicate rule to pushdown predicates currectly in non-deterministic condition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13494 **[Test build #61609 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61609/consoleFull)** for PR 13494 at commit [`88f7308`](https://github.com/apache/spark/commit/88f7308173829ca2473690a0c409c438d3cd5cf4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14002: [SPARK-16335][SQL] Structured streaming should fail if s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14002 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61605/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14002: [SPARK-16335][SQL] Structured streaming should fail if s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14002 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user lianhuiwang commented on the issue: https://github.com/apache/spark/pull/13494 @cloud-fan I have updated with your branch code. Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14002: [SPARK-16335][SQL] Structured streaming should fail if s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14002 **[Test build #61605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61605/consoleFull)** for PR 14002 at commit [`2e5f2ef`](https://github.com/apache/spark/commit/2e5f2efb5481ae900c9c87fd9daf180a18347998). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13919: [SPARK-16222] [SQL] JDBC Sources - Handling illeg...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13919 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13919: [SPARK-16222] [SQL] JDBC Sources - Handling illegal inpu...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/13919 Merged to master/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13494 **[Test build #61608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61608/consoleFull)** for PR 13494 at commit [`41fef2c`](https://github.com/apache/spark/commit/41fef2c40f4929fd26476ecdfa3ee8160394a7d3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/12601#discussion_r69267676 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala --- @@ -19,37 +19,105 @@ package org.apache.spark.sql.execution.datasources.jdbc import java.util.Properties -import org.apache.spark.sql.SQLContext -import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, RelationProvider} +import org.apache.spark.sql.{DataFrame, SaveMode, SQLContext} +import org.apache.spark.sql.sources.{BaseRelation, CreatableRelationProvider, DataSourceRegister, RelationProvider, SchemaRelationProvider} +import org.apache.spark.sql.types.StructType -class JdbcRelationProvider extends RelationProvider with DataSourceRegister { +class JdbcRelationProvider extends CreatableRelationProvider + with SchemaRelationProvider with RelationProvider with DataSourceRegister { override def shortName(): String = "jdbc" - /** Returns a new base relation with the given parameters. */ override def createRelation( sqlContext: SQLContext, parameters: Map[String, String]): BaseRelation = { -val jdbcOptions = new JDBCOptions(parameters) -if (jdbcOptions.partitionColumn != null - && (jdbcOptions.lowerBound == null -|| jdbcOptions.upperBound == null -|| jdbcOptions.numPartitions == null)) { +createRelation(sqlContext, parameters, null) + } + + /** Returns a new base relation with the given parameters. */ + override def createRelation( + sqlContext: SQLContext, + parameters: Map[String, String], + schema: StructType): BaseRelation = { +val url = parameters.getOrElse("url", sys.error("Option 'url' not specified")) +val table = parameters.getOrElse("dbtable", sys.error("Option 'dbtable' not specified")) +val partitionColumn = parameters.getOrElse("partitionColumn", null) +val lowerBound = parameters.getOrElse("lowerBound", null) +val upperBound = parameters.getOrElse("upperBound", null) +val numPartitions = parameters.getOrElse("numPartitions", null) --- End diff -- I think the validation can be done together in `JDBCOptions`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13494 **[Test build #61607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61607/consoleFull)** for PR 13494 at commit [`a22e962`](https://github.com/apache/spark/commit/a22e9626e6294671e0915822def6eb283a72a643). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org