[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 OK great. I think we should avoid breaking developer APIs, unless it has a huge upside. It wouldn't be fun to break it just for some cosmetic things ... --- If your project is set up for it, you can

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18749 What is the compatibility concern? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18780: [INTRA] Close stale PRs

2017-07-31 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18780 If you are asking for their opinions it'd be easier if you ask more explicitly (A vs B) in one comment, rather than asking them to go through and read the entire thread ... --- If your project

[GitHub] spark issue #18752: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-07-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18752 cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18702: [SPARK-21485][SQL][DOCS] Spark SQL documentation generat...

2017-07-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18702 LGTM too. Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18697 cc @cloud-fan @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...

2017-07-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18645 When users upgrade from 2.11 to 2.12, their app would be broken, wouldn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...

2017-07-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18645 @srowen I don't agree that we should just break source compatibility here. We have already spent a lot of time doing this in the past and figuring out how to preserve it. --- If your project is set

[GitHub] spark issue #18715: [minor] Remove **** in test case names in FlatMapGroupsW...

2017-07-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18715 Wait let's ask why @tdas did it this way... On Sun, Jul 23, 2017 at 10:45 AM asfgit <notificati...@github.com> wrote: > Closed #18715 <https://github.com/apache/spark/pul

[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...

2017-07-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18645 It is still source breaking change, and this is why I was saying it would be a lot of work to upgrade to Scala 2.12 without breaking existing source code. For 2.12 we should get rid of the functions

[GitHub] spark issue #18715: [minor] Remove **** in test case names in FlatMapGroupsW...

2017-07-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18715 cc @tdas Was there a reason to use ``? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18715: [minor] Remove **** in test case names in FlatMap...

2017-07-22 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18715 [minor] Remove in test case names in FlatMapGroupsWithStateSuite ## What changes were proposed in this pull request? This patch removes the `` string from test names

[GitHub] spark issue #18709: [SPARK-21504] [SQL] Add spark version info into table me...

2017-07-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18709 "Create Version" isn't a good user facing description. It'd make more sense to just say "Created by Spark xxx" --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #18714: [SPARK-20236][SQL] hive style partition overwrite

2017-07-22 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18714#discussion_r128908118 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -881,6 +881,16 @@ object SQLConf { .intConf

[GitHub] spark issue #18645: [SPARK-14280][BUILD][WIP] Update change-version.sh and p...

2017-07-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18645 @srowen You just showed that the Scala 2.12 changes are source breaking, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #18645: [SPARK-14280][BUILD][WIP] Update change-version.s...

2017-07-22 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18645#discussion_r128890891 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -353,7 +353,7 @@ class DatasetSuite extends QueryTest with SharedSQLContext

[GitHub] spark pull request #18645: [SPARK-14280][BUILD][WIP] Update change-version.s...

2017-07-22 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18645#discussion_r128890868 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskContextSuite.scala --- @@ -54,7 +54,10 @@ class TaskContextSuite extends SparkFunSuite

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18468 Uncompress a small block at a time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18468 Hey sorry for commenting late, but I don't think this change really makes sense. If anything, I'd decompress data in batch into uncompressed column batch, rather than building an adapter

[GitHub] spark issue #18680: [SPARK-21472][SQL] Introduce ArrowColumnVector as a read...

2017-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18680 Have you guys checked the performance of this change? It changes the number of concrete implementations for column vector from 2 to 3 (and potentially 1 to 2 at runtime). This might (or might

[GitHub] spark issue #18487: [SPARK-21243][Core] Limit no. of map outputs in a shuffl...

2017-07-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18487 hm is this a bug fix? if not we shouldn't cherry pick it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18306: [SPARK-21029][SS] All StreamingQuery should be stopped w...

2017-07-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18306 cc @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128162324 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -103,4 +110,19 @@ case class UserDefinedFunction

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128159939 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -103,4 +110,19 @@ case class UserDefinedFunction

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128159874 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -103,4 +110,19 @@ case class UserDefinedFunction

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128159780 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -58,6 +55,13 @@ case class UserDefinedFunction protected

[GitHub] spark issue #17150: [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10

2017-07-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17150 Are you working on 2.12? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #17150: [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10

2017-07-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17150 Do the removal (i.e. this PR). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17150: [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10

2017-07-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17150 Maybe do it a bit later, when the backport rate drops? E.g. it's unlikely we still do a lot of backports when 2.3 is cut. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #18606: [SPARK-21382] The note about Scala 2.10 in building-spar...

2017-07-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18606 It's already merged. https://github.com/apache/spark/commit/24367f23f77349a864da340573e39ab2168c5403 --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18606: [SPARK-21382] The note about Scala 2.10 in building-spar...

2017-07-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18606 That's true. Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-07-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17633 @mallman we don't backport such risky changes to maintenance branches. Those branches typically go through much less testing. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #18586: [SPARK-21358][Examples] Argument of repartitionandsortwi...

2017-07-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18586 Merging in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18559 It'd be important to document what syntaxes are no longer allowed in the JIRA ticket (and PR description), and we also highlight that in release notes. --- If your project is set up for it, you can

[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126072754 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...

2017-07-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r126016128 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -174,28 +191,22 @@ class WindowSpec private[sql

[GitHub] spark pull request #18540: [SPARK-19451][SQL] rangeBetween method should acc...

2017-07-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18540#discussion_r126016260 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -805,4 +806,24 @@ object TypeCoercion

[GitHub] spark pull request #18159: [SPARK-20703][SQL] Associate metrics with data wr...

2017-07-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18159#discussion_r126015755 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -314,21 +339,40 @@ object FileFormatWriter extends

[GitHub] spark issue #18549: [SPARK-21323][SQL]Rename plans.logical.statsEstimation.R...

2017-07-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18549 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-06 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126013379 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,40 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #18307: [SPARK-21100][SQL] describe should give quartiles simila...

2017-07-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18307 high level change looks good to me. @aray can you update the title / description of the PR and JIRA ticket? cc @cloud-fan can you review this to make sure the implementation

[GitHub] spark issue #18494: [SPARK-21272] SortMergeJoin LeftAnti does not update num...

2017-07-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18494 cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18307: [SPARK-21100][SQL] describe should give quartiles...

2017-06-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18307#discussion_r125146093 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2205,37 +2205,151 @@ class Dataset[T] private[sql]( * // max 92.0

[GitHub] spark pull request #18307: [SPARK-21100][SQL] describe should give quartiles...

2017-06-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18307#discussion_r125146112 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2205,37 +2205,151 @@ class Dataset[T] private[sql]( * // max 92.0

[GitHub] spark pull request #18307: [SPARK-21100][SQL] describe should give quartiles...

2017-06-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18307#discussion_r125146063 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2205,37 +2205,151 @@ class Dataset[T] private[sql]( * // max 92.0

[GitHub] spark issue #18479: WIP - logical plan stat propagation using mixin and visi...

2017-06-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18479 Funny tests actually passed. Maybe you guys can just review this. cc @gengliangwang @gatorsmile @wzhfy --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18307: [SPARK-21100][SQL] describe should give quartiles simila...

2017-06-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18307 OK then let's use summary. @aray want to do that update? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18307: [SPARK-21100][SQL] describe should give quartiles...

2017-06-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18307#discussion_r125095026 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2205,37 +2205,170 @@ class Dataset[T] private[sql]( * // max 92.0

[GitHub] spark issue #18334: [SPARK-21127] [SQL] Update statistics after data changin...

2017-06-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18334 Can the stats be updated incrementally? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18424: [SPARK-17091] Add rule to convert IN predicate to equiva...

2017-06-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18424 Have you done actual benchmarks to validate that this is a perf improvement? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test

2017-06-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18469 Can we minimize the change by just adding this method to PlanTest? It's not that many lines of code. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #18479: WIP - stat propagation code using mixin

2017-06-30 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18479 WIP - stat propagation code using mixin ## What changes were proposed in this pull request? TBD ## How was this patch tested? Should be covered by existing test cases. You can merge

[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...

2017-06-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17935 The reason I found out about this is because the one of the widely circulated TPC-DS benchmark harness online uses this. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...

2017-06-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17935 I don't think that argument is useful at all. For example, none of the other databases support the DataFrame API. Does that mean few users will write DataFrame code? --- If your project is set up

[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...

2017-06-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17935 Other committers please revert this change until we find a solution or verify that almost no users write queries like this. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #18307: [SPARK-21100][SQL] describe should give quartiles...

2017-06-29 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18307#discussion_r124932359 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2205,37 +2205,170 @@ class Dataset[T] private[sql]( * // max 92.0

[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...

2017-06-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17935 Also the description / title is completely different from the JIRA ticket. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...

2017-06-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17935 Guys - isn't this API breaking? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18301 hey i didn't track super closely, but it is pretty important to show at least one more digit, e.g. 1.7, rather than just 2. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-06-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15821 In the future we should revert PRs that fail builds IMMEDIATELY. There is no way we should've let the build be broken for days. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #18429: [SPARK-21222] Move elimination of Distinct clause...

2017-06-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18429#discussion_r124457557 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -152,6 +153,19 @@ abstract class Optimizer

[GitHub] spark pull request #18429: [SPARK-21222] Move elimination of Distinct clause...

2017-06-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18429#discussion_r124455032 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -152,6 +153,19 @@ abstract class Optimizer

[GitHub] spark pull request #18429: [SPARK-21222] Move elimination of Distinct clause...

2017-06-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18429#discussion_r124455104 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -152,6 +153,19 @@ abstract class Optimizer

[GitHub] spark pull request #18429: [SPARK-21222] Move elimination of Distinct clause...

2017-06-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18429#discussion_r124452275 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -152,6 +153,19 @@ abstract class Optimizer

[GitHub] spark pull request #18429: [SPARK-21222] Move elimination of Distinct clause...

2017-06-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18429#discussion_r124177929 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateDistinceSuite.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed

[GitHub] spark issue #18368: [SPARK-21102][SQL] Make refresh resource command less ag...

2017-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18368 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18395: [SPARK-20655][core] In-memory KVStore implementation.

2017-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18395 Is this going to be exposed? Either way, we should find something like spark.util.kvstore package rather than a top level package. --- If your project is set up for it, you can reply

[GitHub] spark issue #18042: [SPARK-20817][core] Fix to return "Unknown processor" on...

2017-06-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18042 Please let's not waste more time here. I don't think the gain is worth the effort required (or even the discussions here). --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #18377: [SPARK-18016][SQL][CATALYST][BRANCH-2.2] Code Generation...

2017-06-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18377 Hm I'm not even sure if we should backport this in branch-2.2. Let's wait and see ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18387: [SPARK-21174] [SQL] Validate sampling fraction in logica...

2017-06-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18387 What about CheckAnalysis? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #18387: [SPARK-21174] [SQL] Validate sampling fraction in logica...

2017-06-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18387 hm should we do this? It'd make more sense to throw an analyzer error, rather than some deep call stack that's coming from an operator. --- If your project is set up for it, you can reply

[GitHub] spark issue #18377: [SPARK-18016][SQL][CATALYST][BRANCH-2.2] Code Generation...

2017-06-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18377 Why did we backport this? This seems too risky. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18310: [SPARK-21103][SQL] QueryPlanConstraints should be part o...

2017-06-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18310 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18343 I was talking about the classname for the internal members. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18343 It's obvious it will reduce data size with custom serialization, since the custom logic doesn't need to write the full classname out which the java default one does. I don't think Kryo knows

[GitHub] spark issue #18307: [SPARK-21100][SQL] describe should give quartiles simila...

2017-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18307 My worry is that now the default performance will be slow. Maybe this flag can be off by default? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #18310: [SPARK-21103][SQL] QueryPlanConstraints should be...

2017-06-15 Thread rxin
GitHub user rxin reopened a pull request: https://github.com/apache/spark/pull/18310 [SPARK-21103][SQL] QueryPlanConstraints should be part of LogicalPlan ## What changes were proposed in this pull request? QueryPlanConstraints should be part of LogicalPlan, rather than

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18301 also the avg probe probably shouldn't be an integer. at least we should show something like 1.9? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18301 yes but i just feel it is getting very long and verbose .. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18301 I'd shorten it to "avg hash probe". Also do we really need min, med, max? Maybe just a single global avg? --- If your project is set up for it, you can reply to this email and have your re

[GitHub] spark pull request #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-15 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18301#discussion_r122128307 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -573,8 +586,11 @@ private[execution] final class

[GitHub] spark issue #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan and expr...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18299 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18310: [SPARK-21103][SQL] QueryPlanConstraints should be...

2017-06-14 Thread rxin
Github user rxin closed the pull request at: https://github.com/apache/spark/pull/18310 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #18310: [SPARK-21103][SQL] QueryPlanConstraints should be part o...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18310 Closing for now, since @sameeragarwal said it might be useful in physical planning in the future. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #18310: [SPARK-21103][SQL] QueryPlanConstraints should be part o...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18310 This current includes all the changes from https://github.com/apache/spark/pull/18299 But only the last commit matters. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #18310: [SPARK-21103][SQL] QueryPlanConstraints should be...

2017-06-14 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18310 [SPARK-21103][SQL] QueryPlanConstraints should be part of LogicalPlan ## What changes were proposed in this pull request? QueryPlanConstraints should be part of LogicalPlan, rather than QueryPlan

[GitHub] spark issue #18301: [SPARK-21052][SQL] Add hash map metrics to join

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18301 Can you put a screenshot of the UI up, for both join and aggregate? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18307: [SPARK-21100][SQL] describe should give quartiles simila...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18307 What's the perf impact here? My worry is that we will significantly slow down describe ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan a...

2017-06-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18299#discussion_r122072883 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala --- @@ -27,18 +27,20 @@ trait QueryPlanConstraints

[GitHub] spark issue #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan and expr...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18299 The issue is that SparkSession might change the way they are wired and it's not always the case that when we create a new thread, we set the thread local conf. --- If your project is set up

[GitHub] spark issue #18306: [SPARK-21029][SS] All StreamingQuery should be stopped w...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18306 Is this safe to do @marmbrus ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18298: [SPARK-21091][SQL] Move constraint code into QueryPlanCo...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18298 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18298: [SPARK-21091][SQL] Move constraint code into Quer...

2017-06-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18298#discussion_r122008512 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed

[GitHub] spark issue #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan and expr...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18299 cc @wzhfy --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18299: Spark 21092

2017-06-14 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18299 Spark 21092 ## What changes were proposed in this pull request? It is really painful to not have configs in logical plan and expressions. We had to add all sorts of hacks (e.g. pass SQLConf

[GitHub] spark issue #18299: [SPARK-21092][SQL] Wire SQLConf in logical plan and expr...

2017-06-14 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18299 Note that this patch is based on https://github.com/apache/spark/pull/18298. Once we merge that one the diff will become smaller. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #18298: [SPARK-21091][SQL] Move constraint code into Quer...

2017-06-14 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18298 [SPARK-21091][SQL] Move constraint code into QueryPlanConstraints ## What changes were proposed in this pull request? This patch moves constraint related code into a separate trait

[GitHub] spark pull request #18298: [SPARK-21091][SQL] Move constraint code into Quer...

2017-06-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18298#discussion_r121865658 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlanConstraints.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-06-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r121729635 --- Diff: pom.xml --- @@ -1871,6 +1872,25 @@ paranamer ${paranamer.version} + +org.apache.arrow

[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18260 Why are we doing this? Isn't it better potentially for compression to store them separately? We can also easily remove the offset for fixed length arrays. --- If your project is set up for it, you

<    1   2   3   4   5   6   7   8   9   10   >