date:20160519

[GitHub] spark pull request: [SPARK-15313][SQL] EmbedSerializerInFilter rul...

2016-05-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13096


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15360][Spark-Submit]Should print spark-...

2016-05-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13163#issuecomment-220523618
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58937/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15313][SQL] EmbedSerializerInFilter rul...

2016-05-19 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13096#issuecomment-220523585
  
Alright I'm going to merge this in master/2.0. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15360][Spark-Submit]Should print spark-...

2016-05-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13163#issuecomment-220523617
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...

2016-05-19 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/13045#discussion_r63994025
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -37,6 +38,14 @@ private[sql] object Column {
   def apply(expr: Expression): Column = new Column(expr)
 
   def unapply(col: Column): Option[Expression] = Some(col.expr)
+
+  private[sql] def generateAlias(e: Expression, index: Int): String = {
+e match {
+  case a: AggregateExpression if 
a.aggregateFunction.isInstanceOf[TypedAggregateExpression] =>
+s"${a.aggregateFunction.prettyName}_c${index}"
--- End diff --

@cloud-fan Looks like following. Lets go with this ? I will drop the index 
parameter.

```SQL
+---+---+
|TypedSumDouble(int)|TypedSumDouble(int)|
+---+---+
|   11.0|   11.0|
+---+---+
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15360][Spark-Submit]Should print spark-...

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13163#issuecomment-220523513
  
**[Test build #58937 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58937/consoleFull)**
 for PR 13163 at commit 
[`2941e62`](https://github.com/apache/spark/commit/2941e6273d064376f0e540fa0655c345d9c52461).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread xwu0226

Github user xwu0226 commented on the pull request:

https://github.com/apache/spark/pull/13212#issuecomment-220523283
  
@rxin Updated the code. Please help double check. Thank you very much!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15335] [SQL] Implement TRUNCATE TABLE C...

2016-05-19 Thread hvanhovell

Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/13170#issuecomment-220523237
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-19 Thread wangyang1992

Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/13169#issuecomment-220523060
  
Addressed your comments. @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

2016-05-19 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13181#issuecomment-220522607
  
@marmbrus I tested and could produce the exceptions for reading in 
https://issues.apache.org/jira/browse/SPARK-15393 but it seems this might not 
be the reason.

I tested the codes below on 
https://github.com/apache/spark/commit/c0c3ec35476c756e569a1f34c4b258eb0490585c 
(right before this PR) and master branch.

```scala
  test("SPARK-15393: create empty file") {
withSQLConf(SQLConf.SHUFFLE_PARTITIONS.key -> "10") {
  withTempPath { path =>
val schema = StructType(
  StructField("k", StringType, true) ::
  StructField("v", IntegerType, false) :: Nil)
val emptyDf = 
spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)
emptyDf.write
  .format("parquet")
  .save(path.getCanonicalPath)

val copyEmptyDf = spark.read
  .format("parquet")
  .load(path.getCanonicalPath)

copyEmptyDf.show()
  }
}
  }
```

and it seems both produce the exceptions below:

```scala
Unable to infer schema for ParquetFormat at 
/private/var/folders/9j/gf_c342d7d150mwrxvkqnc18gn/T/spark-98dfbe86-afca-413e-9be7-46ff18bac443.
 It must be specified manually;
org.apache.spark.sql.AnalysisException: Unable to infer schema for 
ParquetFormat at 
/private/var/folders/9j/gf_c342d7d150mwrxvkqnc18gn/T/spark-98dfbe86-afca-413e-9be7-46ff18bac443.
 It must be specified manually;
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$16.apply(DataSource.scala:324)
at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$16.apply(DataSource.scala:324)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:323)
```

I will try to figure out why but please feel free to revert this if you 
think my PR is problematic. I will fix the both issues together anyway later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12979#issuecomment-220522052
  
**[Test build #58945 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58945/consoleFull)**
 for PR 12979 at commit 
[`6ebfa10`](https://github.com/apache/spark/commit/6ebfa10d63a2234dae4e567f06279f5e7feb3df9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR] Fix Typos

2016-05-19 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13078#issuecomment-220521732
  
@srowen we should backport the doc fixes into branch-2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15206][SQL] add testcases for distinct ...

2016-05-19 Thread xwu0226

Github user xwu0226 commented on the pull request:

https://github.com/apache/spark/pull/12984#issuecomment-220521657
  
@cloud-fan Please see if we should add these test cases to 2.0 branch. It 
is related to the distinct aggregate in having clause. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15075][SPARK-15345][SQL] Clean up Spark...

2016-05-19 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/13200#discussion_r63993278
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -735,29 +731,130 @@ object SparkSession {
 }
 
 /**
- * Gets an existing [[SparkSession]] or, if there is no existing one, 
creates a new one
- * based on the options set in this builder.
+ * Gets an existing [[SparkSession]] or, if there is no existing one, 
creates a new
+ * one based on the options set in this builder.
+ *
+ * This method first checks whether there is a valid thread-local 
SparkSession,
+ * and if yes, return that one. It then checks whether there is a 
valid global
+ * default SparkSession, and if yes, return that one. If no valid 
global default
+ * SparkSession exists, the method creates a new SparkSession and 
assigns the
+ * newly created SparkSession as the global default.
+ *
+ * In case an existing SparkSession is returned, the config options 
specified in
+ * this builder will be applied to the existing SparkSession.
  *
  * @since 2.0.0
  */
 def getOrCreate(): SparkSession = synchronized {
-  // Step 1. Create a SparkConf
-  // Step 2. Get a SparkContext
-  // Step 3. Get a SparkSession
-  val sparkConf = new SparkConf()
-  options.foreach { case (k, v) => sparkConf.set(k, v) }
-  val sparkContext = SparkContext.getOrCreate(sparkConf)
-
-  SQLContext.getOrCreate(sparkContext).sparkSession
+  // Get the session from current thread's active session.
+  var session = activeThreadSession.get()
+  if ((session ne null) && !session.sparkContext.isStopped) {
+options.foreach { case (k, v) => session.conf.set(k, v) }
+return session
+  }
+
+  // Global synchronization so we will only set the default session 
once.
+  SparkSession.synchronized {
+// If the current thread does not have an active session, get it 
from the global session.
+session = defaultSession.get()
+if ((session ne null) && !session.sparkContext.isStopped) {
+  options.foreach { case (k, v) => session.conf.set(k, v) }
+  return session
+}
+
+// No active nor global default session. Create a new one.
+val sparkContext = userSuppliedContext.getOrElse {
+  // set app name if not given
+  if (!options.contains("spark.app.name")) {
+options += "spark.app.name" -> 
java.util.UUID.randomUUID().toString
+  }
+
+  val sparkConf = new SparkConf()
+  options.foreach { case (k, v) => sparkConf.set(k, v) }
+  SparkContext.getOrCreate(sparkConf)
+}
+session = new SparkSession(sparkContext)
+options.foreach { case (k, v) => session.conf.set(k, v) }
+defaultSession.set(session)
--- End diff --

@rxin Ok. Got it. Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...

2016-05-19 Thread sun-rui

Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/13165#issuecomment-220521666
  
1. A rough scan of the test failures shows most of them are probably 
related to path handling. You can replay the failed test case in R on Windows.  
For debug facilities in R, refer to 
http://www.inside-r.org/r-doc/base/traceback, 
https://stat.ethz.ch/R-manual/R-devel/library/base/html/debug.html, 
http://www.inside-r.org/r-doc/base/browser
2. You can add a new file named test_Windows.R under 
R/pkg/inst/tests/testthat
3. Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [DOC][MINOR] ml.feature Scala and Python API s...

2016-05-19 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13159#issuecomment-220521612
  
@MLnick did you actually merge this in 2.0?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12979#issuecomment-220521459
  
**[Test build #58944 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58944/consoleFull)**
 for PR 12979 at commit 
[`3c49567`](https://github.com/apache/spark/commit/3c495670716ecf63d21110f7c7ee93500051d26a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15075][SPARK-15345][SQL] Clean up Spark...

2016-05-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13200#discussion_r63993145
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -735,29 +731,130 @@ object SparkSession {
 }
 
 /**
- * Gets an existing [[SparkSession]] or, if there is no existing one, 
creates a new one
- * based on the options set in this builder.
+ * Gets an existing [[SparkSession]] or, if there is no existing one, 
creates a new
+ * one based on the options set in this builder.
+ *
+ * This method first checks whether there is a valid thread-local 
SparkSession,
+ * and if yes, return that one. It then checks whether there is a 
valid global
+ * default SparkSession, and if yes, return that one. If no valid 
global default
+ * SparkSession exists, the method creates a new SparkSession and 
assigns the
+ * newly created SparkSession as the global default.
+ *
+ * In case an existing SparkSession is returned, the config options 
specified in
+ * this builder will be applied to the existing SparkSession.
  *
  * @since 2.0.0
  */
 def getOrCreate(): SparkSession = synchronized {
-  // Step 1. Create a SparkConf
-  // Step 2. Get a SparkContext
-  // Step 3. Get a SparkSession
-  val sparkConf = new SparkConf()
-  options.foreach { case (k, v) => sparkConf.set(k, v) }
-  val sparkContext = SparkContext.getOrCreate(sparkConf)
-
-  SQLContext.getOrCreate(sparkContext).sparkSession
+  // Get the session from current thread's active session.
+  var session = activeThreadSession.get()
+  if ((session ne null) && !session.sparkContext.isStopped) {
+options.foreach { case (k, v) => session.conf.set(k, v) }
+return session
+  }
+
+  // Global synchronization so we will only set the default session 
once.
+  SparkSession.synchronized {
+// If the current thread does not have an active session, get it 
from the global session.
+session = defaultSession.get()
+if ((session ne null) && !session.sparkContext.isStopped) {
+  options.foreach { case (k, v) => session.conf.set(k, v) }
+  return session
+}
+
+// No active nor global default session. Create a new one.
+val sparkContext = userSuppliedContext.getOrElse {
+  // set app name if not given
+  if (!options.contains("spark.app.name")) {
+options += "spark.app.name" -> 
java.util.UUID.randomUUID().toString
+  }
+
+  val sparkConf = new SparkConf()
+  options.foreach { case (k, v) => sparkConf.set(k, v) }
+  SparkContext.getOrCreate(sparkConf)
+}
+session = new SparkSession(sparkContext)
+options.foreach { case (k, v) => session.conf.set(k, v) }
+defaultSession.set(session)
--- End diff --

We would create a new one in that case ...

I'm not too worried about the legacy corner cases here though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread sameeragarwal

Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/13188#discussion_r63993020
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/tpcds/TPCDSQueryBenchmark.scala
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark.tpcds
+
+import java.io.File
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.expressions.SubqueryExpression
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark to measure TPCDS query performance.
+ * To run this:
+ *  spark-submit --class  --jars 
+ */
+object TPCDSQueryBenchmark {
+  val conf =
+new SparkConf()
+  .setMaster("local[1]")
+  .setAppName("test-sql-context")
+  .set("spark.sql.parquet.compression.codec", "snappy")
+  .set("spark.sql.shuffle.partitions", "4")
+  .set("spark.driver.memory", "3g")
+  .set("spark.executor.memory", "3g")
+  .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 
1024).toString)
+
+  val spark = SparkSession.builder.config(conf).getOrCreate()
+
+  val tables = Seq("catalog_page", "catalog_returns", "customer", 
"customer_address",
+"customer_demographics", "date_dim", "household_demographics", 
"inventory", "item",
+"promotion", "store", "store_returns", "catalog_sales", "web_sales", 
"store_sales",
+"web_returns", "web_site", "reason", "call_center", "warehouse", 
"ship_mode", "income_band",
+"time_dim", "web_page")
+
+  def setupTables(dataLocation: String): Map[String, Long] = {
+tables.map { tableName =>
+  
spark.read.parquet(s"$dataLocation/$tableName").createOrReplaceTempView(tableName)
+  tableName -> spark.table(tableName).count()
+}.toMap
+  }
+
+  def tpcdsAll(dataLocation: String, queries: Seq[String]): Unit = {
+require(dataLocation.nonEmpty,
+  "please modify the value of dataLocation to point to your local 
TPCDS data")
+val tableSizes = setupTables(dataLocation)
+spark.conf.set(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key, "true")
+spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
+queries.foreach { name =>
+  val queriesString = fileToString(new 
File(s"sql/core/src/test/scala/org/apache/spark/sql/" +
+s"execution/benchmark/tpcds/queries/$name.sql"))
+
+  // This is an indirect hack to estimate the size of each query's 
input by traversing the
+  // logical plan and adding up the sizes of all tables that appear in 
the plan. Note that this
+  // currently doesn't take WITH subqueries into account which might 
lead to fairly inaccurate
+  // per-row processing time for those cases.
+  val queryRelations = scala.collection.mutable.HashSet[String]()
+  spark.sql(queriesString).queryExecution.logical.map {
+case ur @ UnresolvedRelation(t: TableIdentifier, _) =>
+  queryRelations.add(t.table)
+case lp: LogicalPlan =>
+  lp.expressions.foreach { _ foreach {
+case subquery: SubqueryExpression =>
+  subquery.plan.foreach {
+case ur @ UnresolvedRelation(t: TableIdentifier, _) =>
+  queryRelations.add(t.table)
+case _ =>
+  }
+case _ =>
+  }
+}
+case _ =>
+  }
+  val numRows = queryRelations.map(tableSizes.getOrElse(_, 0L)).sum
+  val benchmark = new Benchmark("TPCDS Snappy", numRows, 5)
+

[GitHub] spark pull request: [SPARK-15313][SQL] EmbedSerializerInFilter rul...

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13096#issuecomment-220521107
  
LGTM, thanks for finding this bug!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15057][GRAPHX] Remove stale TODO commen...

2016-05-19 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12839#issuecomment-220521039
  
Since this is very low risk, I'm going to cherry-pick this in branch-2.0 
too to minimize the diff.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15313][SQL] EmbedSerializerInFilter rul...

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13096#discussion_r63992991
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1560,7 +1561,14 @@ object EmbedSerializerInFilter extends 
Rule[LogicalPlan] {
 val newCondition = condition transform {
   case a: Attribute if a == d.output.head => d.deserializer
 }
-Filter(newCondition, d.child)
+val filter = Filter(newCondition, d.child)
+
+// Adds an extra Project here, to preserve the output expr id of 
`SerializeFromObject`.
+// We will remove it later in RemoveAliasOnlyProject rule.
+val objAttrs = filter.output.zip(s.output).map { case (fout, sout) 
=>
--- End diff --

I'd say it's not object attributes, maybe we should just name it `attrs`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13090#issuecomment-220520673
  
LGTM except one style comment, thanks for working on it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13090#discussion_r63992806
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
 ---
@@ -149,12 +149,12 @@ object RowEncoder {
 dataType = t)
 
 case StructType(fields) =>
-  val convertedFields = fields.zipWithIndex.map { case (f, i) =>
+  val convertedFields = fields.zipWithIndex.flatMap { case (f, i) =>
--- End diff --

We can follow the style in 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L527:

```
val nonNullOutput = CreateNamedStruct(fields.zipWithIndex.flatMap { case 
(field, index) =>
...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15075][SPARK-15345][SQL] Clean up Spark...

2016-05-19 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/13200#discussion_r63992730
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -735,29 +731,130 @@ object SparkSession {
 }
 
 /**
- * Gets an existing [[SparkSession]] or, if there is no existing one, 
creates a new one
- * based on the options set in this builder.
+ * Gets an existing [[SparkSession]] or, if there is no existing one, 
creates a new
+ * one based on the options set in this builder.
+ *
+ * This method first checks whether there is a valid thread-local 
SparkSession,
+ * and if yes, return that one. It then checks whether there is a 
valid global
+ * default SparkSession, and if yes, return that one. If no valid 
global default
+ * SparkSession exists, the method creates a new SparkSession and 
assigns the
+ * newly created SparkSession as the global default.
+ *
+ * In case an existing SparkSession is returned, the config options 
specified in
+ * this builder will be applied to the existing SparkSession.
  *
  * @since 2.0.0
  */
 def getOrCreate(): SparkSession = synchronized {
-  // Step 1. Create a SparkConf
-  // Step 2. Get a SparkContext
-  // Step 3. Get a SparkSession
-  val sparkConf = new SparkConf()
-  options.foreach { case (k, v) => sparkConf.set(k, v) }
-  val sparkContext = SparkContext.getOrCreate(sparkConf)
-
-  SQLContext.getOrCreate(sparkContext).sparkSession
+  // Get the session from current thread's active session.
+  var session = activeThreadSession.get()
+  if ((session ne null) && !session.sparkContext.isStopped) {
+options.foreach { case (k, v) => session.conf.set(k, v) }
+return session
+  }
+
+  // Global synchronization so we will only set the default session 
once.
+  SparkSession.synchronized {
+// If the current thread does not have an active session, get it 
from the global session.
+session = defaultSession.get()
+if ((session ne null) && !session.sparkContext.isStopped) {
+  options.foreach { case (k, v) => session.conf.set(k, v) }
+  return session
+}
+
+// No active nor global default session. Create a new one.
+val sparkContext = userSuppliedContext.getOrElse {
+  // set app name if not given
+  if (!options.contains("spark.app.name")) {
+options += "spark.app.name" -> 
java.util.UUID.randomUUID().toString
+  }
+
+  val sparkConf = new SparkConf()
+  options.foreach { case (k, v) => sparkConf.set(k, v) }
+  SparkContext.getOrCreate(sparkConf)
+}
+session = new SparkSession(sparkContext)
+options.foreach { case (k, v) => session.conf.set(k, v) }
+defaultSession.set(session)
--- End diff --

@rxin Hi Reynold, i had a minor question just for my understanding. When 
users do a 
new SQLContext() , we create a implicit SparkSession. Should this session 
be made
the defaultSession ? If we call, 1) new SQLContext 2) builder.getOrCreate() 
then whats the expected behaviour ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14261][SQL] Memory leak in Spark Thrift...

2016-05-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12932


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14261][SQL] Memory leak in Spark Thrift...

2016-05-19 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12932#issuecomment-220519882
  
I'm going to merge this in master/2.0/1.6. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15433][PySpark] PySpark core test shoul...

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13214#issuecomment-220519810
  
**[Test build #58943 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58943/consoleFull)**
 for PR 13214 at commit 
[`760a4cd`](https://github.com/apache/spark/commit/760a4cda44585c6039fd8954fc43d57174a5cf27).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15282][SQL] PushDownPredicate should no...

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13087#discussion_r63992511
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1025,7 +1025,8 @@ object PushDownPredicate extends Rule[LogicalPlan] 
with PredicateHelper {
 // state and all the input rows processed before. In another word, the 
order of input rows
 // matters for non-deterministic expressions, while pushing down 
predicates changes the order.
 case filter @ Filter(condition, project @ Project(fields, grandChild))
-  if fields.forall(_.deterministic) =>
+if fields.forall(_.deterministic) &&
+  fields.forall(_.find(_.isInstanceOf[ScalaUDF]).isEmpty) =>
--- End diff --

I'm not sure if I understand this correctly, do you mean `ScalaUDF` can be 
nondeterministic and we should always treat it as nondeterministic expression? 
If so, I think a better idea is just override `deterministic` in `ScalaUDF` and 
always return false.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15313][SQL] EmbedSerializerInFilter rul...

2016-05-19 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13096#issuecomment-220519361
  
cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15424][SQL] Revert SPARK-14807 Create a...

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13207#issuecomment-220519275
  
**[Test build #58942 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58942/consoleFull)**
 for PR 13207 at commit 
[`c77616e`](https://github.com/apache/spark/commit/c77616e1ecb020e0657813fa6f14d6aa7f4688d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13188#discussion_r63992275
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/tpcds/TPCDSQueryBenchmark.scala
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark.tpcds
+
+import java.io.File
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.expressions.SubqueryExpression
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark to measure TPCDS query performance.
+ * To run this:
+ *  spark-submit --class  --jars 
+ */
+object TPCDSQueryBenchmark {
+  val conf =
+new SparkConf()
+  .setMaster("local[1]")
+  .setAppName("test-sql-context")
+  .set("spark.sql.parquet.compression.codec", "snappy")
+  .set("spark.sql.shuffle.partitions", "4")
+  .set("spark.driver.memory", "3g")
+  .set("spark.executor.memory", "3g")
+  .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 
1024).toString)
+
+  val spark = SparkSession.builder.config(conf).getOrCreate()
+
+  val tables = Seq("catalog_page", "catalog_returns", "customer", 
"customer_address",
+"customer_demographics", "date_dim", "household_demographics", 
"inventory", "item",
+"promotion", "store", "store_returns", "catalog_sales", "web_sales", 
"store_sales",
+"web_returns", "web_site", "reason", "call_center", "warehouse", 
"ship_mode", "income_band",
+"time_dim", "web_page")
+
+  def setupTables(dataLocation: String): Map[String, Long] = {
+tables.map { tableName =>
+  
spark.read.parquet(s"$dataLocation/$tableName").createOrReplaceTempView(tableName)
+  tableName -> spark.table(tableName).count()
+}.toMap
+  }
+
+  def tpcdsAll(dataLocation: String, queries: Seq[String]): Unit = {
+require(dataLocation.nonEmpty,
+  "please modify the value of dataLocation to point to your local 
TPCDS data")
+val tableSizes = setupTables(dataLocation)
+spark.conf.set(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key, "true")
+spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
+queries.foreach { name =>
+  val queriesString = fileToString(new 
File(s"sql/core/src/test/scala/org/apache/spark/sql/" +
--- End diff --

one thing - these files should go into test/resources, and then we can get 
their path using the getresource function on the current thread's classloader.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15433][PySpark] PySpark core test shoul...

2016-05-19 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/13214

[SPARK-15433][PySpark] PySpark core test should not use SerDe from 
PythonMLLibAPI

## What changes were proposed in this pull request?

Currently PySpark core test uses the `SerDe` from `PythonMLLibAPI` which 
includes many MLlib things. It should use `SerDeUti`l instead.



## How was this patch tested?
Existing tests.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 pycore-use-serdeutil

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13214


commit 760a4cda44585c6039fd8954fc43d57174a5cf27
Author: Liang-Chi Hsieh 
Date:   2016-05-20T05:12:47Z

PySpark core test should not use SerDe from PythonMLLibAPI.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r63992232
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1387,6 +1387,27 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
   }
 
   /**
+* Return a list of file paths that are added to resources.
+* If file paths are provided, return the ones that are added to 
resources.
+*/
+  def listFiles(files: Seq[String] = Seq.empty[String]): Seq[String] = {
--- End diff --

@rxin OK. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13188#discussion_r63992224
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/tpcds/TPCDSQueryBenchmark.scala
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark.tpcds
+
+import java.io.File
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.expressions.SubqueryExpression
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark to measure TPCDS query performance.
+ * To run this:
+ *  spark-submit --class  --jars 
+ */
+object TPCDSQueryBenchmark {
+  val conf =
+new SparkConf()
+  .setMaster("local[1]")
+  .setAppName("test-sql-context")
+  .set("spark.sql.parquet.compression.codec", "snappy")
+  .set("spark.sql.shuffle.partitions", "4")
+  .set("spark.driver.memory", "3g")
+  .set("spark.executor.memory", "3g")
+  .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 
1024).toString)
+
+  val spark = SparkSession.builder.config(conf).getOrCreate()
+
+  val tables = Seq("catalog_page", "catalog_returns", "customer", 
"customer_address",
+"customer_demographics", "date_dim", "household_demographics", 
"inventory", "item",
+"promotion", "store", "store_returns", "catalog_sales", "web_sales", 
"store_sales",
+"web_returns", "web_site", "reason", "call_center", "warehouse", 
"ship_mode", "income_band",
+"time_dim", "web_page")
+
+  def setupTables(dataLocation: String): Map[String, Long] = {
+tables.map { tableName =>
+  
spark.read.parquet(s"$dataLocation/$tableName").createOrReplaceTempView(tableName)
+  tableName -> spark.table(tableName).count()
+}.toMap
+  }
+
+  def tpcdsAll(dataLocation: String, queries: Seq[String]): Unit = {
+require(dataLocation.nonEmpty,
+  "please modify the value of dataLocation to point to your local 
TPCDS data")
+val tableSizes = setupTables(dataLocation)
+spark.conf.set(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key, "true")
+spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
+queries.foreach { name =>
+  val queriesString = fileToString(new 
File(s"sql/core/src/test/scala/org/apache/spark/sql/" +
+s"execution/benchmark/tpcds/queries/$name.sql"))
+
+  // This is an indirect hack to estimate the size of each query's 
input by traversing the
+  // logical plan and adding up the sizes of all tables that appear in 
the plan. Note that this
+  // currently doesn't take WITH subqueries into account which might 
lead to fairly inaccurate
+  // per-row processing time for those cases.
+  val queryRelations = scala.collection.mutable.HashSet[String]()
+  spark.sql(queriesString).queryExecution.logical.map {
+case ur @ UnresolvedRelation(t: TableIdentifier, _) =>
+  queryRelations.add(t.table)
+case lp: LogicalPlan =>
+  lp.expressions.foreach { _ foreach {
+case subquery: SubqueryExpression =>
+  subquery.plan.foreach {
+case ur @ UnresolvedRelation(t: TableIdentifier, _) =>
+  queryRelations.add(t.table)
+case _ =>
+  }
+case _ =>
+  }
+}
+case _ =>
+  }
+  val numRows = queryRelations.map(tableSizes.getOrElse(_, 0L)).sum
+  val benchmark = new Benchmark("TPCDS Snappy", numRows, 5)
+  benchmark.addCase(name) {

[GitHub] spark pull request: [SPARK-15363][ML][Example]:Example code should...

2016-05-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13213#issuecomment-220519177
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58941/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14990][SQL] nvl, coalesce, array with p...

2016-05-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12768


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15363][ML][Example]:Example code should...

2016-05-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13213#issuecomment-220519176
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15363][ML][Example]:Example code should...

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13213#issuecomment-220519138
  
**[Test build #58941 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58941/consoleFull)**
 for PR 13213 at commit 
[`818dc7f`](https://github.com/apache/spark/commit/818dc7fe8f1be835243de8d096d43b229e356cbc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14990][SQL] Fix checkForSameTypeInputEx...

2016-05-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13208


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r63992144
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1387,6 +1387,27 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
   }
 
   /**
+* Return a list of file paths that are added to resources.
+* If file paths are provided, return the ones that are added to 
resources.
+*/
+  def listFiles(files: Seq[String] = Seq.empty[String]): Seq[String] = {
--- End diff --

no they can filter themselves easily.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread sameeragarwal

Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/13188#discussion_r63992090
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/tpcds/TPCDSQueryBenchmark.scala
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark.tpcds
+
+import java.io.File
+
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.expressions.SubqueryExpression
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark to measure TPCDS query performance.
+ * To run this:
+ *  spark-submit --class  --jars 
+ */
+object TPCDSQueryBenchmark {
+  val conf =
+new SparkConf()
+  .setMaster("local[1]")
+  .setAppName("test-sql-context")
+  .set("spark.sql.parquet.compression.codec", "snappy")
+  .set("spark.sql.shuffle.partitions", "4")
+  .set("spark.driver.memory", "3g")
+  .set("spark.executor.memory", "3g")
+  .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 
1024).toString)
+
+  val spark = SparkSession.builder.config(conf).getOrCreate()
+
+  val tables = Seq("catalog_page", "catalog_returns", "customer", 
"customer_address",
+"customer_demographics", "date_dim", "household_demographics", 
"inventory", "item",
+"promotion", "store", "store_returns", "catalog_sales", "web_sales", 
"store_sales",
+"web_returns", "web_site", "reason", "call_center", "warehouse", 
"ship_mode", "income_band",
+"time_dim", "web_page")
+
+  def setupTables(dataLocation: String): Map[String, Long] = {
+tables.map { tableName =>
+  
spark.read.parquet(s"$dataLocation/$tableName").createOrReplaceTempView(tableName)
+  tableName -> spark.table(tableName).count()
+}.toMap
+  }
+
+  def tpcdsAll(dataLocation: String, queries: Seq[String]): Unit = {
+require(dataLocation.nonEmpty,
+  "please modify the value of dataLocation to point to your local 
TPCDS data")
+val tableSizes = setupTables(dataLocation)
+spark.conf.set(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key, "true")
+spark.conf.set(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true")
+queries.foreach { name =>
+  val queriesString = fileToString(new 
File(s"sql/core/src/test/scala/org/apache/spark/sql/" +
+s"execution/benchmark/tpcds/queries/$name.sql"))
+
+  // This is an indirect hack to estimate the size of each query's 
input by traversing the
+  // logical plan and adding up the sizes of all tables that appear in 
the plan. Note that this
+  // currently doesn't take WITH subqueries into account which might 
lead to fairly inaccurate
+  // per-row processing time for those cases.
+  val queryRelations = scala.collection.mutable.HashSet[String]()
+  spark.sql(queriesString).queryExecution.logical.map {
+case ur @ UnresolvedRelation(t: TableIdentifier, _) =>
+  queryRelations.add(t.table)
+case lp: LogicalPlan =>
+  lp.expressions.foreach { _ foreach {
+case subquery: SubqueryExpression =>
+  subquery.plan.foreach {
+case ur @ UnresolvedRelation(t: TableIdentifier, _) =>
+  queryRelations.add(t.table)
+case _ =>
+  }
+case _ =>
+  }
+}
+case _ =>
+  }
+  val numRows = queryRelations.map(tableSizes.getOrElse(_, 0L)).sum
+  val benchmark = new Benchmark("TPCDS Snappy", numRows, 5)
+

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r63992055
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1387,6 +1387,27 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
   }
 
   /**
+* Return a list of file paths that are added to resources.
+* If file paths are provided, return the ones that are added to 
resources.
+*/
+  def listFiles(files: Seq[String] = Seq.empty[String]): Seq[String] = {
--- End diff --

@rxin Just one concern about thiss one. It is possible that users just 
invoked listFiles or listJars directly with sparkContext. Do we want to provide 
filtering for this case? Right now, I have a [test 
case](https://github.com/xwu0226/spark/blob/21b092ab84b22abec93fde1fc1ca177db68d9f0d/core/src/test/scala/org/apache/spark/SparkContextSuite.scala#L159-L176)
 that covers this case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14990][SQL] Fix checkForSameTypeInputEx...

2016-05-19 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13208#issuecomment-220518940
  
Merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...

2016-05-19 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/13045#discussion_r63992015
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -37,6 +38,14 @@ private[sql] object Column {
   def apply(expr: Expression): Column = new Column(expr)
 
   def unapply(col: Column): Option[Expression] = Some(col.expr)
+
+  private[sql] def generateAlias(e: Expression, index: Int): String = {
+e match {
+  case a: AggregateExpression if 
a.aggregateFunction.isInstanceOf[TypedAggregateExpression] =>
+s"${a.aggregateFunction.prettyName}_c${index}"
--- End diff --

ok.. let me get the output for you and paste it here so its easier to 
decide. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14990][SQL] Fix checkForSameTypeInputEx...

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13208#issuecomment-220518610
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15363][ML][Example]:Example code should...

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13213#issuecomment-220518293
  
**[Test build #58941 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58941/consoleFull)**
 for PR 13213 at commit 
[`818dc7f`](https://github.com/apache/spark/commit/818dc7fe8f1be835243de8d096d43b229e356cbc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15363][ML][Example]:Example code should...

2016-05-19 Thread wangmiao1981

GitHub user wangmiao1981 opened a pull request:

https://github.com/apache/spark/pull/13213

[SPARK-15363][ML][Example]:Example code shouldn't use VectorImplicits._, 
asML/fromML

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)
In this DataFrame example, we use VectorImplicits._, which is private API. 

Since Vectors object has public API, we use Vectors.fromML instead of 
implicts.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)

Manually run the example.





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangmiao1981/spark ml

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13213


commit 818dc7fe8f1be835243de8d096d43b229e356cbc
Author: wm...@hotmail.com 
Date:   2016-05-20T05:00:35Z

remove VectorImplicits in example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13169#issuecomment-220518098
  
LGTM, except some minor comment, thanks for working on it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13169#discussion_r63991642
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ---
@@ -353,6 +353,20 @@ class DateTimeUtilsSuite extends SparkFunSuite {
 c.getTimeInMillis * 1000 + 123456)
   }
 
+  test("SPARK-15379 :special invalid date string") {
--- End diff --

nit: `SPARK-15379: special ...`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...

2016-05-19 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/13045#discussion_r63991554
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala ---
@@ -240,4 +240,15 @@ class DatasetAggregatorSuite extends QueryTest with 
SharedSQLContext {
 val df2 = Seq(1 -> "a", 2 -> "b", 3 -> "b").toDF("i", "j")
 checkAnswer(df2.agg(RowAgg.toColumn as "b").select("b"), Row(6) :: Nil)
   }
+
+  test("spark-15114 shorter system generated alias names") {
+val ds = Seq(1, 3, 2, 5).toDS()
+assert(ds.select(typed.sum((i: Int) => i)).columns.head === 
"typedsumdouble_c1")
+val ds2 = ds.select(typed.sum((i: Int) => i), typed.avg((i: Int) => i))
+assert(ds2.columns.head === "typedsumdouble_c1")
--- End diff --

@cloud-fan Just wanted to show some difference to user between two 
aggregate expressions like sum(col1), sum(col2) will show up as 
typedsumdouble_c1 and typedsumdouble_c2. You think its fine to just report 
without any suffix ? If you think its ok, then may be we can just create 
resolved Aliases in Column.named as opposed to deferring it to Analyzer ? 
Please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15428][SQL] Disable multiple streaming ...

2016-05-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13210#issuecomment-220517869
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58936/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15428][SQL] Disable multiple streaming ...

2016-05-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13210#issuecomment-220517868
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15428][SQL] Disable multiple streaming ...

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13210#issuecomment-220517778
  
**[Test build #58936 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58936/consoleFull)**
 for PR 13210 at commit 
[`65d45a9`](https://github.com/apache/spark/commit/65d45a947e905ee14fd8a7556032dd5035182648).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13169#discussion_r63991470
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -426,6 +426,26 @@ object DateTimeUtils {
   }
 
   /**
+   * Return true if the date is invalid.
+   */
+  private def checkInvalidDate(year: Int, month: Int, day: Int): Boolean = 
{
--- End diff --

nit: as it returns boolean, I think `isInvalidDate` is a better name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r63991418
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala 
---
@@ -46,3 +46,33 @@ case class AddFile(path: String) extends RunnableCommand 
{
 Seq.empty[Row]
   }
 }
+
+/**
+ * Return a list of file paths that are added to resources.
+ * If file paths are provided, return the ones that are added to resources.
+ */
+case class ListFiles(files: Seq[String] = Seq.empty[String]) extends 
RunnableCommand {
--- End diff --

@rxin Thank you very much! I will make the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r63991451
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1387,6 +1387,27 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
   }
 
   /**
+* Return a list of file paths that are added to resources.
+* If file paths are provided, return the ones that are added to 
resources.
+*/
+  def listFiles(files: Seq[String] = Seq.empty[String]): Seq[String] = {
--- End diff --

Agree. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r63991431
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala 
---
@@ -46,3 +46,33 @@ case class AddFile(path: String) extends RunnableCommand 
{
 Seq.empty[Row]
   }
 }
+
+/**
+ * Return a list of file paths that are added to resources.
+ * If file paths are provided, return the ones that are added to resources.
+ */
+case class ListFiles(files: Seq[String] = Seq.empty[String]) extends 
RunnableCommand {
+  override val output: Seq[Attribute] = {
+val schema = StructType(
+  StructField("result", StringType, nullable = false) :: Nil)
+schema.toAttributes
+  }
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+sparkSession.sparkContext.listFiles(files).map(Row(_))
+  }
+}
+
+/**
+ * Return a list of jar files that are added to resources.
+ * If jar files are provided, return the ones that are added to resources.
+ */
+case class ListJars(jars: Seq[String] = Seq.empty[String]) extends 
RunnableCommand {
--- End diff --

Will change!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13156#issuecomment-220517365
  
**[Test build #58940 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58940/consoleFull)**
 for PR 13156 at commit 
[`20d5055`](https://github.com/apache/spark/commit/20d50556c6a3a4ca2d69f961822a2bb058edbbec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r63991280
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1387,6 +1387,27 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
   }
 
   /**
+* Return a list of file paths that are added to resources.
+* If file paths are provided, return the ones that are added to 
resources.
+*/
+  def listFiles(files: Seq[String] = Seq.empty[String]): Seq[String] = {
--- End diff --

i think this one should not take any parameter, and if you need filtering, 
just do it in ListFilesCommand


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r63991284
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1724,6 +1745,22 @@ class SparkContext(config: SparkConf) extends 
Logging with ExecutorAllocationCli
 postEnvironmentUpdate()
   }
 
+  /**
+* Return a list of jar files that are added to resources.
+* If jar files are provided, return the ones that are added to 
resources.
+*/
+  def listJars(jars: Seq[String] = Seq.empty[String]): Seq[String] = {
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13045#discussion_r63991309
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -37,6 +38,14 @@ private[sql] object Column {
   def apply(expr: Expression): Column = new Column(expr)
 
   def unapply(col: Column): Option[Expression] = Some(col.expr)
+
+  private[sql] def generateAlias(e: Expression, index: Int): String = {
+e match {
+  case a: AggregateExpression if 
a.aggregateFunction.isInstanceOf[TypedAggregateExpression] =>
+s"${a.aggregateFunction.prettyName}_c${index}"
--- End diff --

how about `aggregateFunction.toString`? It carries more information and not 
that verbose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13045#discussion_r63991215
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala ---
@@ -240,4 +240,15 @@ class DatasetAggregatorSuite extends QueryTest with 
SharedSQLContext {
 val df2 = Seq(1 -> "a", 2 -> "b", 3 -> "b").toDF("i", "j")
 checkAnswer(df2.agg(RowAgg.toColumn as "b").select("b"), Row(6) :: Nil)
   }
+
+  test("spark-15114 shorter system generated alias names") {
+val ds = Seq(1, 3, 2, 5).toDS()
+assert(ds.select(typed.sum((i: Int) => i)).columns.head === 
"typedsumdouble_c1")
+val ds2 = ds.select(typed.sum((i: Int) => i), typed.avg((i: Int) => i))
+assert(ds2.columns.head === "typedsumdouble_c1")
--- End diff --

I'm not sure how useful this `_c1` postfix is, maybe we can remove it and 
simplify the `aliasFunc`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r63991184
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala 
---
@@ -46,3 +46,33 @@ case class AddFile(path: String) extends RunnableCommand 
{
 Seq.empty[Row]
   }
 }
+
+/**
+ * Return a list of file paths that are added to resources.
+ * If file paths are provided, return the ones that are added to resources.
+ */
+case class ListFiles(files: Seq[String] = Seq.empty[String]) extends 
RunnableCommand {
+  override val output: Seq[Attribute] = {
+val schema = StructType(
+  StructField("result", StringType, nullable = false) :: Nil)
+schema.toAttributes
+  }
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+sparkSession.sparkContext.listFiles(files).map(Row(_))
+  }
+}
+
+/**
+ * Return a list of jar files that are added to resources.
+ * If jar files are provided, return the ones that are added to resources.
+ */
+case class ListJars(jars: Seq[String] = Seq.empty[String]) extends 
RunnableCommand {
--- End diff --

ListJarsCommand


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r63991179
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala 
---
@@ -46,3 +46,33 @@ case class AddFile(path: String) extends RunnableCommand 
{
 Seq.empty[Row]
   }
 }
+
+/**
+ * Return a list of file paths that are added to resources.
+ * If file paths are provided, return the ones that are added to resources.
+ */
+case class ListFiles(files: Seq[String] = Seq.empty[String]) extends 
RunnableCommand {
--- End diff --

ListFilesCommand


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15075][SPARK-15345][SQL] Clean up Spark...

2016-05-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13200


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13045#discussion_r63991134
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -325,10 +325,13 @@ case class UnresolvedExtractValue(child: Expression, 
extraction: Expression)
  * Holds the expression that has yet to be aliased.
  *
  * @param child The computation that is needs to be resolved during 
analysis.
- * @param aliasName The name if specified to be associated with the result 
of computing [[child]]
+ * @param aliasFunc The function if specified to be called to generate an 
alias to associate
--- End diff --

we need to say more about the 2 parameters this `aliasFunc` takes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15236][SQL][SPARK SHELL] Add spark-defa...

2016-05-19 Thread xwu0226

Github user xwu0226 commented on the pull request:

https://github.com/apache/spark/pull/13088#issuecomment-220516958
  
@rxin @yhuai @andrewor14 Please help check if the updated change is in the 
right direction, Thank you very much!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15075][SPARK-15345][SQL] Clean up Spark...

2016-05-19 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13200#issuecomment-220516955
  
Thanks - merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

2016-05-19 Thread gatorsmile

Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/13156#issuecomment-220516949
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13045#discussion_r63991063
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -174,14 +174,16 @@ class Analyzer(
 private def assignAliases(exprs: Seq[NamedExpression]) = {
   exprs.zipWithIndex.map {
 case (expr, i) =>
-  expr.transformUp { case u @ UnresolvedAlias(child, 
optionalAliasName) =>
+  expr.transformUp { case u @ UnresolvedAlias(child, 
optGenAliasFunc) =>
 child match {
   case ne: NamedExpression => ne
   case e if !e.resolved => u
   case g: Generator => MultiAlias(g, Nil)
   case c @ Cast(ne: NamedExpression, _) => Alias(c, ne.name)()
   case e: ExtractValue => Alias(e, toPrettySQL(e))()
-  case e => Alias(e, 
optionalAliasName.getOrElse(toPrettySQL(e)))()
+  case e if optGenAliasFunc.isDefined =>
+Alias(child, s"${optGenAliasFunc.get.apply(e, i + 1)}")()
--- End diff --

nit: we can just use `optGenAliasFunc.get.apply(e, i + 1)`, no need to wrap 
it with `s"${}"` ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15313][SQL] EmbedSerializerInFilter rul...

2016-05-19 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13096#issuecomment-220516852
  
Can you add the jira ticket somewhere as inline comment in the test case 
and in the analyzer code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15075][SPARK-15345][SQL] Clean up Spark...

2016-05-19 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/13200#issuecomment-220516804
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15075][SPARK-15345][SQL] Clean up Spark...

2016-05-19 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13200#issuecomment-220516531
  
@marmbrus i know you were looking at this. Did you end up going through it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread xwu0226

Github user xwu0226 commented on the pull request:

https://github.com/apache/spark/pull/13212#issuecomment-220516455
  
cc @yhuai @hvanhovell @gatorsmile Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13212#issuecomment-220516397
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10125#issuecomment-220516202
  
thanks, merging to master and 2.0!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...

2016-05-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10125


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15075][SPARK-15345][SQL] Clean up Spark...

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13200#issuecomment-220515969
  
**[Test build #58939 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58939/consoleFull)**
 for PR 13200 at commit 
[`e4a4bc1`](https://github.com/apache/spark/commit/e4a4bc1f590770ff95f3fb0277b3e0e8050cec72).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-19 Thread xwu0226

GitHub user xwu0226 opened a pull request:

https://github.com/apache/spark/pull/13212

[SPARK-15431][SQL] Support LIST FILE(s)|JAR(s) command natively

## What changes were proposed in this pull request?
Currently command "ADD FILE|JAR " is supported natively 
in SparkSQL. However, when this command is run, the file/jar is added to the 
resources that can not be looked up by "LIST FILE(s)|JAR(s)" command because 
the LIST command is passed to Hive command processor in Spark-SQL or simply not 
supported in Spark-shell. There is no way users can find out what files/jars 
are added to the spark context.
Refer to [Hive 
commands](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli)

This PR is to support following commands:
`LIST (FILE[s] [filepath ...] | JAR[s] [jarfile ...])`

### For example:
# LIST FILE(s)
```
scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt")
res1: org.apache.spark.sql.DataFrame = []
scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt")
res2: org.apache.spark.sql.DataFrame = []

scala> spark.sql("list file 
hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt").show(false)
+--+
|result|
+--+
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt|
+--+

scala> spark.sql("list files").show(false)
+--+
|result|
+--+
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt|
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt |
+--+
```

# LIST JAR(s)
```
scala> spark.sql("add jar 
/Users/xinwu/spark/core/src/test/resources/TestUDTF.jar")
res9: org.apache.spark.sql.DataFrame = [result: int]

scala> spark.sql("list jar TestUDTF.jar").show(false)
+-+
|result   |
+-+
|spark://192.168.1.234:50131/jars/TestUDTF.jar|
+-+


scala> spark.sql("list jars").show(false)
+-+
|result   |
+-+
|spark://192.168.1.234:50131/jars/TestUDTF.jar|
+-+
```
## How was this patch tested?
New test cases are added for Spark-SQL, Spark-Shell and SparkContext API 
code path.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xwu0226/spark list_command

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13212.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13212


commit 3866e3dcbfbd9fe0e18ecde3b23bb14757e06a0c
Author: xin Wu 
Date:   2016-05-08T07:06:36Z

spark-15206 add testcases for distinct aggregate in having clause following 
up PR12974

commit 951d3edc412ef3d6f77d70a4dd7dd7add966d7b1
Author: xin Wu 
Date:   2016-05-08T07:09:44Z

Revert "spark-15206 add testcases for distinct aggregate in having clause 
following up PR12974"

This reverts commit 98a1f804d7343ba77731f9aa400c00f1a26c03fe.

commit 5b30cc3c0eb20c134e21942ef96a26e452f9171c
Author: xin Wu 
Date:   2016-05-17T22:09:57Z

adding spark native support for LIST FILES/JARS

commit 6396ec1591134ca3fd754a6a2684bc8b81218951
Author: xin Wu 
Date:   2016-05-17T22:52:31Z

update testcase

commit 79e97be7917d23f44f60cc857a471b14cb96831c
Author: xin Wu 
Date:   2016-05-19T07:07:02Z

support listing specific file(s)

commit a4dc6164ff51b428dae282aa90042758c4ae33d7
Author: Xin Wu 
Date:   2016-05-19T07:33:50Z

update testcases

commit 688c294060cb00cd6c387591bf700e58bdd3dba8
Author: Xin Wu 
Date:   2016-05-19T22:57:16Z

align with PR 13122

commit a0a76a3c5ff93dbf42f07bebd54b7a3514e87132
Author: Xin Wu 
Date:   2016-05-19T23:07:32Z

code style

commit 923988ac5d21e0c0afc6bf76d21a27e8f46f1246
Author: Xin Wu 
Date:   2016-05-19T23:11:36Z

code style

commit 21b092ab84b22abec93fde1fc1ca177db68d9f0d
Author: Xin Wu 
Date:   2016-05-20T04:16:26Z

update comments




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...

2016-05-19 Thread dilipbiswal

Github user dilipbiswal commented on the pull request:

https://github.com/apache/spark/pull/13045#issuecomment-220515698
  
cc @cloud-fan Hi Wenchen, I have made the changes per your comments. Could 
you please look through it when you get a chance ? Thanks..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13156#issuecomment-220515502
  
LGTM, pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13209#discussion_r63990349
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -348,6 +348,11 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val CARTESIAN_PRODUCT_ENABLED = 
SQLConfigBuilder("spark.sql.join.cartesian.enabled")
+.doc("When false, we will throw an error if a query contains a 
cartesian product")
+.booleanConf
+.createWithDefault(false)
+
   val ORDER_BY_ORDINAL = SQLConfigBuilder("spark.sql.orderByOrdinal")
 .doc("When true, the ordinal numbers are treated as the position in 
the select list. " +
  "When false, the ordinal numbers in order/sort By clause are 
ignored.")
--- End diff --

no it's not this pr but @sameeragarwal can you fix it while you are at it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...

2016-05-19 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13165#issuecomment-220515342
  
This raises some questions to me.

1. It seems several tests were failed. Could you please inform me your 
thoughts?

2. Now, I think I can add some tests but could you please where I should 
write the related tests and maybe rough ideas of the tests I should add?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...

2016-05-19 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13165#issuecomment-220515182
  
@sun-rui @felixcheung Right. It seems finally I made it. I made gists and 
upload a PDF file for Spark UI.

Let me tell you the test results first.

Here is the stdout output for the tests on Windwos 7 32bit, 
[output.msg](https://gist.github.com/HyukjinKwon/6a10719d2ca67e04ece2b23a8f92dc62).

Here is the stderr output for the tests on Windwos 7 32bit, 
[output.err](https://gist.github.com/HyukjinKwon/54984d57ee18236d46e965d07b31f77a).

Here is the PDF for [Spark UI 
PDF](https://drive.google.com/open?id=0B7RfLjRU7QTnVVA2bkVMVFkzNEE)

1. I run tests after building Spark on Windows according to 
[`./R/WINDOWS.md`] (https://github.com/apache/spark/blob/master/R/WINDOWS.md)

2. It seems `$HADOOP_HOME` should be set.

3. It seems `winutils.exe` is required which is included in Hadoop official 
binary although it reads file in the local file system.

4. And run the tests by the command below:

```bash
cd bin
spark-submit2.cmd --conf spark.hadoop.fs.defualt.name="file:///" 
..\R\pkg\tests\run-all.R > output.msg 2> output.err
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15321] Fix bug where Array[Timestamp] c...

2016-05-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13108


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15321] Fix bug where Array[Timestamp] c...

2016-05-19 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13108#issuecomment-220514860
  
LGTM, merging to master and 2.0, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15430][SQL] Fix potential ConcurrentMod...

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13211#issuecomment-220514553
  
**[Test build #58938 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58938/consoleFull)**
 for PR 13211 at commit 
[`4d97bf0`](https://github.com/apache/spark/commit/4d97bf093f4f4d41cf530a4c7464532635c2b3fe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15430][SQL] Fix potential ConcurrentMod...

2016-05-19 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/13211

[SPARK-15430][SQL] Fix potential ConcurrentModificationException for 
ListAccumulator

## What changes were proposed in this pull request?

In `ListAccumulator` we create an unmodifiable view for underlying list. 
However, it doesn't prevent the underlying to be modified further. So as we 
access the unmodifiable list, the underlying list can be modified in the same 
time. It could cause `java.util.ConcurrentModificationException`. We can 
observe such exception in recent tests.

To fix it, we can copy a list of the underlying list and then create the 
unmodifiable view of this list instead.

## How was this patch tested?
The exception might be difficult to test.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 fix-concurrentmodify

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13211.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13211


commit 4d97bf093f4f4d41cf530a4c7464532635c2b3fe
Author: Liang-Chi Hsieh 
Date:   2016-05-20T04:15:49Z

Fix potential ConcurrentModificationException for ListAccumulator.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

2016-05-19 Thread gatorsmile

Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/13156#issuecomment-220513843
  
retest this please



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and exa...

2016-05-19 Thread GayathriMurali

Github user GayathriMurali commented on the pull request:

https://github.com/apache/spark/pull/13176#issuecomment-220513197
  
@hhbyyh Can you please help review this? I will resolve the branch conflict 
along with review comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15379][SQL] check special invalid date

2016-05-19 Thread wangyang1992

Github user wangyang1992 commented on the pull request:

https://github.com/apache/spark/pull/13169#issuecomment-220512727
  
@cloud-fan  Could you please help me look at this some time? A simple fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15360][Spark-Submit]Should print spark-...

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13163#issuecomment-220512285
  
**[Test build #58937 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58937/consoleFull)**
 for PR 13163 at commit 
[`2941e62`](https://github.com/apache/spark/commit/2941e6273d064376f0e540fa0655c345d9c52461).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15360][Spark-Submit]Should print spark-...

2016-05-19 Thread wangmiao1981

Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13163#discussion_r63988573
  
--- Diff: 
launcher/src/test/java/org/apache/spark/launcher/SparkSubmitCommandBuilderSuite.java
 ---
@@ -59,6 +59,18 @@ public void testClusterCmdBuilder() throws Exception {
   }
 
   @Test
+  public void testCliHelpAndNoArg() throws Exception {
+List sparkSubmitArgs = Arrays.asList(parser.HELP);
+Map env = new HashMap<>();
+List cmd = buildCommand(sparkSubmitArgs, env);
+assertTrue("--help should be contained in the final cmd.", 
cmd.contains(parser.HELP));
+
+List sparkEmptyArgs = Arrays.asList("");
+cmd = buildCommand(sparkSubmitArgs, env);
--- End diff --

Sorry for this obvious mistake! It is really a stupid mistake. Thanks for 
your time!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15321] Fix bug where Array[Timestamp] c...

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13108#issuecomment-220511906
  
**[Test build #2999 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2999/consoleFull)**
 for PR 13108 at commit 
[`387e6c9`](https://github.com/apache/spark/commit/387e6c912191bed1d4d4e09ede92f6ea1cc85a51).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15428][SQL] Disable multiple streaming ...

2016-05-19 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/13210#issuecomment-220511611
  
cc @marmbrus 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15335] [SQL] Implement TRUNCATE TABLE C...

2016-05-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13170#issuecomment-220511333
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58935/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15335] [SQL] Implement TRUNCATE TABLE C...

2016-05-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13170#issuecomment-220511331
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15428][SQL] Disable multiple streaming ...

2016-05-19 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/13210#discussion_r63988111
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 ---
@@ -55,10 +55,19 @@ object UnsupportedOperationChecker {
 case _: InsertIntoTable =>
   throwError("InsertIntoTable is not supported with streaming 
DataFrames/Datasets")
 
-case Aggregate(_, _, child) if child.isStreaming && outputMode == 
Append =>
-  throwError(
-"Aggregations are not supported on streaming 
DataFrames/Datasets in " +
-  "Append output mode. Consider changing output mode to 
Update.")
+case Aggregate(_, _, child) if child.isStreaming =>
+  if (outputMode == Append) {
+throwError(
+  "Aggregations are not supported on streaming 
DataFrames/Datasets in " +
+"Append output mode. Consider changing output mode to 
Update.")
--- End diff --

I didnt get you. IntelliJ seems to catching all the uses of Append object 
properly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15335] [SQL] Implement TRUNCATE TABLE C...

2016-05-19 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13170#issuecomment-220511206
  
**[Test build #58935 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58935/consoleFull)**
 for PR 13170 at commit 
[`10377ba`](https://github.com/apache/spark/commit/10377ba78f26d9aa42502d0b5cfeea561ff96162).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 >

1 - 100 of 811 matches

Mail list logo