[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16664 Basically I see no reason to add some specific parameter to a listener API that is meant to be generic which already contains reference to QueryExecution. What are you going to do if next time you

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16664 I think that's a separate "bug" we should fix, i.e. DataFrameWriter should use InsertIntoDataSourceCommand so we can consolidate the two paths. --- If your project is set up for it, you

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16664 Well it does. It contains the entire plan. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16664 That's probably because you are not familiar with the SQL component. The existing API already has references to the QueryExecution object, which actually includes all of the information your

[GitHub] spark issue #16885: Encryption of shuffle files

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16885 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16887: [SPARK-19549] Allow providing reason for stage/job cance...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16887 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

2017-02-10 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100565585 --- Diff: docs/sql-programming-guide.md --- @@ -1300,10 +1300,28 @@ Configuration of in-memory caching can be done using the `setConf` method on `Sp

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

2017-02-10 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100565522 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala --- @@ -44,27 +44,50 @@ trait QueryExecutionListener

[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...

2017-02-10 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100564925 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset

[GitHub] spark issue #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener callback...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16664 Sorry I'm really confused, probably because I haven't kept track with this pr. But the diff doesn't match the pr description. Are we fixing a bug here or introducing a bunch of new APIs

[GitHub] spark pull request #16887: [SPARK-19549] Allow providing reason for stage/jo...

2017-02-10 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16887#discussion_r100552660 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -696,9 +696,9 @@ class DAGScheduler( /** * Cancel a job

[GitHub] spark pull request #16887: [SPARK-19549] Allow providing reason for stage/jo...

2017-02-10 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16887#discussion_r100552370 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2207,20 +2207,22 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request #16864: [SPARK-19527][Core] Approximate Size of Intersect...

2017-02-10 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16864#discussion_r100503141 --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java --- @@ -81,6 +81,11 @@ int getVersionNumber() { public abstract

[GitHub] spark issue #16875: [BACKPORT-2.1][SPARK-19512][SQL] codegen for compare str...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16875 Merging in branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16875: [BACKPORT-2.1][SPARK-19512][SQL] codegen for compare str...

2017-02-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16875 @bogdanrdc can you close this? It won't auto close because it is not merged in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #16872: [SPARK-19514] Making range interruptible.

2017-02-09 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16872#discussion_r100396033 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -127,4 +133,28 @@ class DataFrameRangeSuite extends QueryTest

[GitHub] spark issue #16864: [SPARK-19527][Core] Approximate Size of Intersection of ...

2017-02-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16864 I meant just union, but createUnion ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16872: [SPARK-19514] Making range interruptible.

2017-02-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16872 I'm going to merge this in master. If we find a way to optimize the test we can do a follow-up pr. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16872: [SPARK-19514] Making range interruptible.

2017-02-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16872 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #16871: [SPARK-19493][BUILD][CORE][WIP] Remove Java 7 sup...

2017-02-09 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16871#discussion_r100287048 --- Diff: build/mvn --- @@ -22,7 +22,7 @@ _DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" # Preserve t

[GitHub] spark pull request #16871: [SPARK-19493][BUILD][CORE][WIP] Remove Java 7 sup...

2017-02-09 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16871#discussion_r100287082 --- Diff: core/src/test/java/org/apache/spark/Java8RDDAPISuite.java --- @@ -15,7 +15,7 @@ * limitations under the License. */ -package

[GitHub] spark pull request #16871: [SPARK-19493][BUILD][CORE][WIP] Remove Java 7 sup...

2017-02-09 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16871#discussion_r100284451 --- Diff: core/src/test/java/org/apache/spark/Java8RDDAPISuite.java --- @@ -15,7 +15,7 @@ * limitations under the License. */ -package

[GitHub] spark pull request #16871: [SPARK-19493][BUILD][CORE][WIP] Remove Java 7 sup...

2017-02-09 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16871#discussion_r100284373 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1910,31 +1908,7 @@ private[spark] object Utils extends Logging { * @return

[GitHub] spark issue #16871: [SPARK-19493][BUILD][CORE][WIP] Remove Java 7 support

2017-02-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16871 With this, what's the behavior if users use a Java 7 runtime to run Spark? What kind of errors do we generate? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #16871: [SPARK-19493][BUILD][CORE][WIP] Remove Java 7 sup...

2017-02-09 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16871#discussion_r100284098 --- Diff: build/mvn --- @@ -22,7 +22,7 @@ _DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" # Preserve t

[GitHub] spark issue #16864: [SPARK-19527][Core] Approximate Size of Intersection of ...

2017-02-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16864 cc @mengxr / @tjhunter / @jkbradley is this good to have? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16864: [SPARK-19527][Core] Approximate Size of Intersect...

2017-02-09 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16864#discussion_r100261227 --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java --- @@ -81,6 +81,11 @@ int getVersionNumber() { public abstract

[GitHub] spark pull request #16864: [SPARK-19527][Core] Approximate Size of Intersect...

2017-02-09 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16864#discussion_r100261151 --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java --- @@ -148,6 +153,20 @@ int getVersionNumber() { public abstract

[GitHub] spark pull request #16864: [SPARK-19527][Core] Approximate Size of Intersect...

2017-02-09 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16864#discussion_r100261088 --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/IncompatibleUnionException.java --- @@ -0,0 +1,24 @@ +/* + * Licensed

[GitHub] spark issue #16826: Fork SparkSession with option to inherit a copy of the S...

2017-02-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16826 @kunalkhamar you should create a JIRA ticket for this. In addition, I'm not a big fan of the design to pass a base session in. It'd be simpler if there is just a clone method on sessionstate

[GitHub] spark pull request #16826: Fork SparkSession with option to inherit a copy o...

2017-02-09 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16826#discussion_r100255729 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -213,6 +218,24 @@ class SparkSession private( new SparkSession

[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...

2017-02-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16856 I think the issue is that the programming guide should probably switch over to the DataFrame one as the primary one, and then the RDD one as a RDD programming guide. cc @matei for his input

[GitHub] spark issue #16810: [SPARK-19464][CORE][YARN][test-hadoop2.6] Remove support...

2017-02-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16810 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.6/3810/ --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #16810: [SPARK-19464][CORE][YARN][test-hadoop2.6] Remove support...

2017-02-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16810 Did we break the build? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...

2017-02-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16837 Does this change not require changing the other external catalog? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16835: [SPARK-19495][SQL] Make SQLConf slightly more extensible

2017-02-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16835 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16835: [SPARK-19495][SQL] Make SQLConf slightly more ext...

2017-02-07 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/16835 [SPARK-19495][SQL] Make SQLConf slightly more extensible ## What changes were proposed in this pull request? This pull request makes SQLConf slightly more extensible by removing the visibility

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16594 ok here is an idea how about ``` explain stats xxx ``` as the way to add stats? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16829: [SPARK-19447] Fixing input metrics for range operator.

2017-02-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16829 Merging in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16829: [SPARK-19447] Fixing input metrics for range operator.

2017-02-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16829 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16832: [SPARK-19490][SQL] change hive column names to lower cas...

2017-02-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16832 hm is it safe to just do this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16826: Fork SparkSession with option to inherit a copy of the S...

2017-02-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16826 What is the semantics? Do functions/settings on the base SparkSession affect the new forked? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #16791: [SPARK-19409][SPARK-17213] Cleanup Parquet workarounds/h...

2017-02-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16791 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16796: [SPARK-10063] Follow-up: remove dead code related...

2017-02-03 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/16796 [SPARK-10063] Follow-up: remove dead code related to an old output committer. ## What changes were proposed in this pull request? DirectParquetOutputCommitter was removed from Spark

[GitHub] spark issue #16792: [SPARK-19453][PYTHON][SQL][DOC] Correct and extend DataF...

2017-02-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16792 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16756: [SPARK-19411][SQL] Remove the metadata used to mark opti...

2017-02-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16756 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-01-31 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16751 can you put rest of the cleanups in one place? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16751: [SPARK-19409][BUILD] Bump parquet version to 1.8.2

2017-01-31 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16751 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16742: [SPARK-19403][PYTHON][SQL] Correct pyspark.sql.column.__...

2017-01-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16742 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16742: [SPARK-19403][PYTHON][SQL] Correct pyspark.sql.column.__...

2017-01-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16742 LGTM, but can you update your description: ``` This removes from the __all__ list class names that are not defined (visible) in the pyspark.sql.column. ``` Your current

[GitHub] spark issue #16731: [SPARK-19393][SQL] Add `approx_percentile` Dataset/DataF...

2017-01-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16731 to be honest I really hate it with Scala/Java when we need to add so many functions just for a single function. Can we just tell users to use `expr("approx_percentile(...)")`? --- If yo

[GitHub] spark issue #16533: [SPARK-19160][PYTHON][SQL] Add udf decorator

2017-01-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16533 Can return type also take a string? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16534: [SPARK-19161][PYTHON][SQL] Improving UDF Docstrings

2017-01-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16534 Is the goal to change the doc or the repl string? It might be useful to change the repl string but I'm not sure if it is worth changing the doc. --- If your project is set up for it, you can reply

[GitHub] spark issue #16708: [SPARK-19366][SQL] add getNumPartitions to Dataset

2017-01-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16708 Yes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16708: [SPARK-19366][SQL] add getNumPartitions to Dataset

2017-01-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16708 Basically I want to push back against exposing this as a public API ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16707 LGTM pending jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16708: [SPARK-19366][SQL] add getNumPartitions to Dataset

2017-01-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16708 Actually - why do we need this? I worry it can be a confusing API due to optimizer behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #16708: [SPARK-19366][SQL] add getNumPartitions to Datase...

2017-01-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16708#discussion_r97935710 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2421,6 +2421,13 @@ class Dataset[T] private[sql

[GitHub] spark issue #16707: [SPARK-19338][SQL] Add UDF names in explain

2017-01-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16707 Maybe add a prefix so it is clear a UDF? e.g. `UDF:func_name(...)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16702: [SPARK-18495][UI] Document meaning of green dot in DAG v...

2017-01-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16702 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16594 sorry this explain plan makes no sense -- it is impossible to read. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16637: [SPARK-19225][SQL]round decimal return normal value but ...

2017-01-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16637 Also I think we need to update the code gen path as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16637: [SPARK-19225][SQL]round decimal return normal value but ...

2017-01-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16637 Can you add a test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16633 This breaks the RDD job chain doesn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #12004: [SPARK-7481] [build] Add spark-cloud module to pull in o...

2017-01-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/12004 I've pointed out this before, and again: FWIW I really don't see what this pull request is trying to accomplish --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #16622: [SPARK-18917][SQL] Remove schema check in appending data

2017-01-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16622 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16611: [WIP][SPARK-17967][SPARK-17878][SQL][PYTHON] Supp...

2017-01-17 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16611#discussion_r96532765 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextHadoopFsRelationSuite.scala --- @@ -69,18 +69,19 @@ class

[GitHub] spark pull request #16622: [SPARK-18917][SQL] Remove schema check in appendi...

2017-01-17 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/16622 [SPARK-18917][SQL] Remove schema check in appending data ## What changes were proposed in this pull request? In append mode, we check whether the schema of the write is compatible with the schema

[GitHub] spark issue #16339: [SPARK-18917][SQL] Add Skip Partition Check Flag to avoi...

2017-01-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16339 I submitted a pr here https://github.com/apache/spark/pull/16622 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-17 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r96482971 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -30,21 +30,42 @@ import

[GitHub] spark issue #16483: [SPARK-18847][GraphX] PageRank gives incorrect results f...

2017-01-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16483 cc @ankurdave --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-01-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16611 Rather than just submitting code, can you put down the interfaces concisely either in a doc or the pr description? As @falaki said, we need this to work in DDL too. It is possible to just extend

[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...

2017-01-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16585 BTW please add a test case for this. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16598: [SPARK-19236] Added createOrReplaceGlobalTempView...

2017-01-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16598#discussion_r96320999 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2603,6 +2603,21 @@ class Dataset[T] private[sql]( def createGlobalTempView

[GitHub] spark pull request #16598: [SPARK-19236] Added createOrReplaceGlobalTempView...

2017-01-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16598#discussion_r96320992 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2603,6 +2603,21 @@ class Dataset[T] private[sql]( def createGlobalTempView

[GitHub] spark pull request #16591: [SPARK-19227][CORE] remove unused imports and out...

2017-01-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16591#discussion_r96320937 --- Diff: core/src/main/scala/org/apache/spark/executor/OutputMetrics.scala --- @@ -20,7 +20,6 @@ package org.apache.spark.executor import

[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...

2017-01-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16585 should the proper fix be the python thread transfers the proper information over? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16595: [Minor][YARN] Move YarnSchedulerBackendSuite to resource...

2017-01-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16595 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #16339: [SPARK-18917][SQL] Add Skip Partition Check Flag ...

2017-01-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16339#discussion_r96304206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -445,21 +445,28 @@ case class DataSource

[GitHub] spark pull request #16608: [SPARK-13721][SQL] Support outer generators in Da...

2017-01-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16608#discussion_r96294261 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1621,9 +1621,11 @@ class Analyzer

[GitHub] spark pull request #16608: [SPARK-13721][SQL] Support outer generators in Da...

2017-01-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16608#discussion_r96294175 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala --- @@ -86,13 +86,25 @@ class GeneratorFunctionSuite extends QueryTest

[GitHub] spark pull request #16608: [SPARK-13721][SQL] Support outer generators in Da...

2017-01-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16608#discussion_r96294115 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala --- @@ -86,13 +86,25 @@ class GeneratorFunctionSuite extends QueryTest

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r96171901 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala --- @@ -41,13 +46,18 @@ object ReplaceExpressions extends

[GitHub] spark issue #16499: [SPARK-17204][CORE] Fix replicated off heap storage

2017-01-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16499 also cc @sameeragarwal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16581: [SPARK-18589] [SQL] Fix Python UDF accessing attr...

2017-01-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16581#discussion_r9630 --- Diff: python/pyspark/sql/tests.py --- @@ -342,6 +342,14 @@ def test_udf_in_filter_on_top_of_outer_join(self): df = df.withColumn('b', udf

[GitHub] spark pull request #16581: [SPARK-18589] [SQL] Fix Python UDF accessing attr...

2017-01-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16581#discussion_r9624 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -86,6 +86,19 @@ trait PredicateHelper

[GitHub] spark issue #16558: Fix missing close-parens for In filter's toString

2017-01-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16558 Alright i'm going to merge this given JIRA is down ... merging in master/branch-2.1/branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #16568: [SPARK-18971][Core]Upgrade Netty to 4.0.43.Final

2017-01-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16568 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...

2017-01-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16404 Make sure you update the pull request and jira ticket description before you merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16404: [SPARK-18969][SQL] Support grouping by nondeterministic ...

2017-01-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16404 LGTM on the behavior --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16559: [WIP] Add expression index and test cases

2017-01-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16559 Do we not have something similar already? cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16538: [SPARK-19164][PYTHON][SQL] Remove unused UserDefinedFunc...

2017-01-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16538 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16558: Fix missing close-parens for In filter's toString

2017-01-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16558 Oops - LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-01-11 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r95726607 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -0,0 +1,555

[GitHub] spark pull request #16554: [SPARK-19183] [SQL] Add deleteWithJob hook to int...

2017-01-11 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16554#discussion_r95702468 --- Diff: core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala --- @@ -112,6 +113,15 @@ abstract class FileCommitProtocol { * just

[GitHub] spark issue #16551: [SPARK-19132] [SQL] Add test cases for row size estimati...

2017-01-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16551 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16544: [SPARK-19149][SQL] Follow-up: simplify cache implementat...

2017-01-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16544 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16308: [SPARK-18936][SQL] Infrastructure for session loc...

2017-01-11 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16308#discussion_r95686129 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -111,7 +112,8 @@ case class CatalogTablePartition

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-01-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16541 Is this a perf optimization? If yes, can you show some benchmarks? Also for codegen it's good to show the generated code before/after this change. You can get

[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-01-11 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16395#discussion_r95527981 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -116,6 +116,12 @@ case class Filter

<    5   6   7   8   9   10   11   12   13   14   >